malinjawi opened a new pull request, #12197:
URL: https://github.com/apache/gluten/pull/12197
What changes are proposed in this pull request?
This PR is the next split from the Delta deletion-vector (DV) scan stack,
following the native reader support already merged in #12040 and before the
full JVM scan handoff work from #12131.
It adds a focused Scala utility layer that extracts the essential DV scan
information from Spark/Delta `PartitionedFile` metadata without changing scan
offload behavior yet.
Main changes:
- add `DeltaDeletionVectorScanInfo` for Delta 3.3 and Delta 4.0 source sets
- extract per-file DV scan info from `PartitionedFile` metadata:
- row-index filter type
- deletion-vector descriptor and cardinality
- serialized DV bitmap payload bytes
- normalized non-DV metadata columns
- keep the utility independent from Substrait, Velox native split
conversion, and scan offload behavior
- add focused Delta 3.3 and Delta 4.0 tests for DV extraction,
keep-all/no-DV extraction, and invalid partial DV metadata
This PR is intentionally utility-only:
- no Substrait proto changes
- no native/C++ changes
- no Delta scan rule replacement
- no end-to-end scan offload behavior change yet
Those pieces stay in follow-up PRs after this API is reviewed.
How was this patch tested?
Validation run:
- `JAVA_HOME=$(/usr/libexec/java_home -v 17) ./build/mvn test-compile -pl
backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta
-DskipTests`
- `JAVA_HOME=$(/usr/libexec/java_home -v 17) ./build/mvn test-compile -pl
backends-velox -am
-Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta
-DskipTests`
- `git diff --check`
Also attempted the focused suite with `dev/run-scala-test.sh`, but the local
runner failed during classpath compilation before executing the suite while
switching profiles locally. The module-level Spark 3.5 and Spark 4.0
test-compile checks above pass.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: IBM BOB
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]