wForget opened a new issue, #4780: URL: https://github.com/apache/datafusion-comet/issues/4780
### What is the problem the feature request solves? ## Background Comet's shuffle partitioners currently own too much of the local shuffle write logic directly, including writing shuffle data files, writing index files, handling spill files, and finalizing partition offsets. This makes the partitioning logic tightly coupled with the local file-based shuffle storage implementation. It also makes it harder to introduce alternative shuffle storage backends, such as a remote shuffle writer, because each partitioner would need to be updated with backend-specific write behavior. ## Proposal Introduce a `ShufflePartitionWriter` / `PartitionWriter` abstraction for shuffle partition output. The partitioners should focus on producing partitioned `RecordBatch` streams, while the writer implementation should own the details of how shuffle data is stored and finalized. The initial implementation should move the existing local file-based shuffle write behavior into a local partition writer implementation, preserving the current behavior for: - single-partition shuffle - multi-partition shuffle - empty-schema shuffle - spill handling - data file and index file generation - shuffle write metrics ## Benefits This refactor separates shuffle partitioning from shuffle storage, making the code easier to extend and maintain. It also creates a clear extension point for future remote shuffle support. A remote shuffle writer can later implement the same writer interface without requiring the shuffle partitioners to know whether the output is written to local files or to a remote shuffle service. ## Scope This issue is intended to be a refactor only. It should preserve the existing local shuffle behavior and prepare the codebase for future remote shuffle writer support. ### Describe the potential solution _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
