malinjawi opened a new pull request, #12197:
URL: https://github.com/apache/gluten/pull/12197

   What changes are proposed in this pull request?
   
   This PR is the next split from the Delta deletion-vector (DV) scan stack, 
following the native reader support already merged in #12040 and before the 
full JVM scan handoff work from #12131.
   
   It adds a focused Scala utility layer that extracts the essential DV scan 
information from Spark/Delta `PartitionedFile` metadata without changing scan 
offload behavior yet.
   
   Main changes:
   
   - add `DeltaDeletionVectorScanInfo` for Delta 3.3 and Delta 4.0 source sets
   - extract per-file DV scan info from `PartitionedFile` metadata:
     - row-index filter type
     - deletion-vector descriptor and cardinality
     - serialized DV bitmap payload bytes
     - normalized non-DV metadata columns
   - keep the utility independent from Substrait, Velox native split 
conversion, and scan offload behavior
   - add focused Delta 3.3 and Delta 4.0 tests for DV extraction, 
keep-all/no-DV extraction, and invalid partial DV metadata
   
   This PR is intentionally utility-only:
   
   - no Substrait proto changes
   - no native/C++ changes
   - no Delta scan rule replacement
   - no end-to-end scan offload behavior change yet
   
   Those pieces stay in follow-up PRs after this API is reviewed.
   
   How was this patch tested?
   
   Validation run:
   
   - `JAVA_HOME=$(/usr/libexec/java_home -v 17) ./build/mvn test-compile -pl 
backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta 
-DskipTests`
   - `JAVA_HOME=$(/usr/libexec/java_home -v 17) ./build/mvn test-compile -pl 
backends-velox -am 
-Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta 
-DskipTests`
   - `git diff --check`
   
   Also attempted the focused suite with `dev/run-scala-test.sh`, but the local 
runner failed during classpath compilation before executing the suite while 
switching profiles locally. The module-level Spark 3.5 and Spark 4.0 
test-compile checks above pass.
   
   Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: IBM BOB
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to