malinjawi opened a new pull request, #12069:
URL: https://github.com/apache/gluten/pull/12069

   ### What changes were proposed in this pull request?
   
   This patch preserves partition columns in native partition split output only 
when Delta's write contract includes those partition columns in `dataColumns`.
   
   The change:
   - Detects whether Delta expects partition columns in the writer data columns.
   - Passes that contract to the Velox partition splitter.
   - Keeps native stats aggregation scoped to Delta data columns when the 
written batch includes extra partition columns.
   - Adds Delta 4.0 coverage for Iceberg-compatible partitioned native writes 
with stats enabled.
   
   This is the second split from #12016. It is stacked on #12016 and should be 
reviewed after that PR merges.
   
   ### Why are the changes needed?
   
   Some Delta write modes keep partition columns in the writer batch. The 
native splitter should preserve those columns only for those modes, while the 
Delta stats tracker must still compute AddFile stats over Delta data columns 
only.
   
   ### Does this PR introduce any user-facing change?
   
   No public API change. This improves correctness for native Delta partitioned 
writes.
   
   ### How was this patch tested?
   
   Built locally and ran:
   
   ```bash
   JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-17.jdk/Contents/Home \
   ./dev/run-scala-test.sh --force \
     -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta \
     -pl backends-velox \
     -s org.apache.spark.sql.delta.DeltaNativeWriteSuite \
     -t "native delta Iceberg-compatible partitioned write should collect stats"
   ```
   
   Result: 1 test passed, 0 failures.
   
   Also ran the partitioned optimized write layout regression test on the 
stacked branch, Spark 4.0 `backends-velox` test compilation, C++ 
`gluten`/`velox` native build, and `git diff --check`.
   
   Related issue: #10215
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to