malinjawi opened a new pull request, #12069:
URL: https://github.com/apache/gluten/pull/12069
### What changes were proposed in this pull request?
This patch preserves partition columns in native partition split output only
when Delta's write contract includes those partition columns in `dataColumns`.
The change:
- Detects whether Delta expects partition columns in the writer data columns.
- Passes that contract to the Velox partition splitter.
- Keeps native stats aggregation scoped to Delta data columns when the
written batch includes extra partition columns.
- Adds Delta 4.0 coverage for Iceberg-compatible partitioned native writes
with stats enabled.
This is the second split from #12016. It is stacked on #12016 and should be
reviewed after that PR merges.
### Why are the changes needed?
Some Delta write modes keep partition columns in the writer batch. The
native splitter should preserve those columns only for those modes, while the
Delta stats tracker must still compute AddFile stats over Delta data columns
only.
### Does this PR introduce any user-facing change?
No public API change. This improves correctness for native Delta partitioned
writes.
### How was this patch tested?
Built locally and ran:
```bash
JAVA_HOME=/Library/Java/JavaVirtualMachines/zulu-17.jdk/Contents/Home \
./dev/run-scala-test.sh --force \
-Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta \
-pl backends-velox \
-s org.apache.spark.sql.delta.DeltaNativeWriteSuite \
-t "native delta Iceberg-compatible partitioned write should collect stats"
```
Result: 1 test passed, 0 failures.
Also ran the partitioned optimized write layout regression test on the
stacked branch, Spark 4.0 `backends-velox` test compilation, C++
`gluten`/`velox` native build, and `git diff --check`.
Related issue: #10215
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]