wForget commented on code in PR #3076: URL: https://github.com/apache/datafusion-comet/pull/3076#discussion_r2710665322
########## docs/source/user-guide/latest/compatibility.md: ########## @@ -69,6 +69,31 @@ this can be overridden by setting `spark.comet.regexp.allowIncompatible=true`. Comet's support for window functions is incomplete and known to be incorrect. It is disabled by default and should not be used in production. The feature will be enabled in a future release. Tracking issue: [#2721](https://github.com/apache/datafusion-comet/issues/2721). +## Round-Robin Partitioning + +Comet's native shuffle implementation of round-robin partitioning (`df.repartition(n)`) is not compatible with +Spark's implementation and is disabled by default. It can be enabled by setting +`spark.comet.native.shuffle.partitioning.roundrobin.enabled=true`. + +**Why the incompatibility exists:** + +Spark's round-robin partitioning sorts rows by their binary `UnsafeRow` representation before assigning them to +partitions. This ensures deterministic output for fault tolerance (task retries produce identical results). +Comet uses Arrow format internally, which has a completely different binary layout than `UnsafeRow`, making it +impossible to match Spark's exact partition assignments. + +**Comet's approach:** + +Instead of true round-robin assignment, Comet implements round-robin as hash partitioning on ALL columns. This +achieves the same semantic goals: + +- **Even distribution**: Rows are distributed evenly across partitions Review Comment: How do we ensure this? Are row hashes always uniformly distributed? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
