malinjawi opened a new pull request, #12198: URL: https://github.com/apache/gluten/pull/12198
What changes are proposed in this pull request? This PR is the next split in the Delta deletion-vector scan stack and is stacked on #12197. It adds the JVM/Substrait/Velox handoff that consumes the essential Delta DV scan info extracted by #12197, materializes serialized DV payloads on the JVM side, and passes them to native scan execution. Main changes: - add a Delta DV preprocessing rule for the Velox Delta component without replacing Delta's `PrepareDeltaScan` - reuse `DeltaDeletionVectorScanInfo` from #12197 to extract per-file DV metadata and serialized DV bytes from Delta-prepared scan files - add Delta local files Substrait nodes/builders carrying `DeltaReadOptions` - embed the serialized DV payload in `DeltaReadOptions`, instead of passing essential DV data through generic metadata columns - add a native `DeltaSplitInfo` path for Delta-specific split metadata - wire the handoff through `VeloxIteratorApi`, `VeloxPlanConverter`, `WholeStageResultIterator`, and `SubstraitToVeloxPlan` - strip Spark's synthetic DV predicate/internal columns only after the native split has the payload, so Velox applies the DV filter natively and we avoid double filtering - add Spark 3.5 and Spark 4.0 focused handoff coverage This PR is intentionally handoff-only: - the DV scan info extraction utility is reviewed separately in #12197 - performance/benchmark iteration remains a follow-up after the correctness handoff shape is reviewed Issue: #11901 How was this patch tested? Validation run locally: - `JAVA_HOME=$(/usr/libexec/java_home -v 17) ./build/mvn test-compile -pl backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests` - `JAVA_HOME=$(/usr/libexec/java_home -v 17) ./build/mvn test-compile -pl backends-velox -am -Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta -DskipTests` - `git diff --check` - clang-format over touched C++ files with `/opt/homebrew/opt/llvm@15/bin/clang-format` Was this patch authored or co-authored using generative AI tooling? Generated-by: IBM BOB -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
