malinjawi opened a new pull request, #12198:
URL: https://github.com/apache/gluten/pull/12198

   What changes are proposed in this pull request?
   
   This PR is the next split in the Delta deletion-vector scan stack and is 
stacked on #12197.
   
   It adds the JVM/Substrait/Velox handoff that consumes the essential Delta DV 
scan info extracted by #12197, materializes serialized DV payloads on the JVM 
side, and passes them to native scan execution.
   
   Main changes:
   
   - add a Delta DV preprocessing rule for the Velox Delta component without 
replacing Delta's `PrepareDeltaScan`
   - reuse `DeltaDeletionVectorScanInfo` from #12197 to extract per-file DV 
metadata and serialized DV bytes from Delta-prepared scan files
   - add Delta local files Substrait nodes/builders carrying `DeltaReadOptions`
   - embed the serialized DV payload in `DeltaReadOptions`, instead of passing 
essential DV data through generic metadata columns
   - add a native `DeltaSplitInfo` path for Delta-specific split metadata
   - wire the handoff through `VeloxIteratorApi`, `VeloxPlanConverter`, 
`WholeStageResultIterator`, and `SubstraitToVeloxPlan`
   - strip Spark's synthetic DV predicate/internal columns only after the 
native split has the payload, so Velox applies the DV filter natively and we 
avoid double filtering
   - add Spark 3.5 and Spark 4.0 focused handoff coverage
   
   This PR is intentionally handoff-only:
   
   - the DV scan info extraction utility is reviewed separately in #12197
   - performance/benchmark iteration remains a follow-up after the correctness 
handoff shape is reviewed
   
   Issue: #11901
   
   How was this patch tested?
   
   Validation run locally:
   
   - `JAVA_HOME=$(/usr/libexec/java_home -v 17) ./build/mvn test-compile -pl 
backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta 
-DskipTests`
   - `JAVA_HOME=$(/usr/libexec/java_home -v 17) ./build/mvn test-compile -pl 
backends-velox -am 
-Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta 
-DskipTests`
   - `git diff --check`
   - clang-format over touched C++ files with 
`/opt/homebrew/opt/llvm@15/bin/clang-format`
   
   Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: IBM BOB


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to