jaylisde opened a new pull request, #12067: URL: https://github.com/apache/gluten/pull/12067
## Summary Spark 4.1 introduced shuffle checksum end-to-end verification (SPARK-53322), requiring `MapStatus.checksumValue` to be non-zero and `.checksum` files to contain valid per-partition checksums. Gluten's `ColumnarShuffleWriter` was passing an empty checksum array to `writeMetadataFileAndCommit` and omitting the `checksumValue` parameter in `MapStatus`. **Fix:** After native shuffle write completes, read the data file (still in page cache) and compute per-partition checksums using the same algorithm as Spark's verification logic (`spark.shuffle.checksum.algorithm`, default ADLER32). Pass the checksums to `writeMetadataFileAndCommit` and an aggregated value to `MapStatus`. This is a pure Scala-layer fix — no C++/JNI changes required. The data file was just written by the native shuffle writer and remains in page cache, so the sequential read is effectively a memory operation with negligible overhead. ## Changes - `ColumnarShuffleWriter.scala`: Added `computePartitionChecksums()` method that reads the shuffle data file and computes per-partition checksums using `ShuffleChecksumHelper.getChecksumByAlgorithm()`. Respects `spark.shuffle.checksum.enabled` and `spark.shuffle.checksum.algorithm` configs. - `VeloxTestSettings.scala`: Enabled `GlutenMapStatusEndToEndSuite` (previously commented out). ## Test - `GlutenMapStatusEndToEndSuite` passes with default config (ansiFallback=true) - Verified with `-Dspark.gluten.sql.ansiFallback.enabled=false`: `ColumnarShuffleWriter` produces correct ADLER32 checksums Closes #11915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
