mbutrovich opened a new issue, #4130: URL: https://github.com/apache/datafusion-comet/issues/4130
## Describe the proposed change `supportedRangePartitioningDataType` in `spark/src/main/scala/org/apache/spark/sql/comet/execution/shuffle/CometShuffleExchangeExec.scala:344` admits `FloatType` and `DoubleType` unconditionally. It does not consult `spark.comet.exec.strictFloatingPoint`. Other ordering-dependent expressions already do: - `CometSortOrder` in `spark/src/main/scala/org/apache/comet/serde/CometSortOrder.scala:34` - `CometSortArray` in `spark/src/main/scala/org/apache/comet/serde/arrays.scala:150` Both return `Incompatible` when the ordered type contains Float/Double and `strictFloatingPoint=true`. `RangePartitioning` should follow the same pattern. ## Rationale Range partitioning samples rows, sorts the samples, picks split points, then buckets rows by those split points. Arrow's float ordering differs from Spark's on NaN and `-0.0` vs `0.0`: - Spark's `Double.compare`: NaN sorts largest, `-0.0 == 0.0`. - Arrow's `sort_to_indices`: `-0.0 < 0.0`, NaN at extremes. Split points chosen by Comet can therefore differ from those chosen by Spark, and rows containing NaN or `-0.0` can land in different partitions under Comet vs Spark. Users who care about strict Spark parity already set `strictFloatingPoint=true`, expecting Comet to fall back on ordering operations that are not bit-for-bit compatible. `RangePartitioning` currently ignores that contract. ## Additional context Found while reviewing #4076 (MapSort support on Spark 4.0), which has the same gap for its own expression. That PR addresses it locally for `MapSort`. This issue tracks the equivalent fix for `RangePartitioning`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
