malinjawi opened a new pull request, #12214:
URL: https://github.com/apache/gluten/pull/12214

   What changes are proposed in this pull request?
   
   This PR is the next split for Delta deletion-vector MoR support. It adds the 
native bitmap primitive needed by later DELETE DV work, without changing DELETE 
routing or enabling native bitmap construction in the command path yet.
   
   Main changes:
   
   - extend `RoaringBitmapArray` for Delta Portable-format deletion-vector 
payloads
   - add bounded deserialization using CRoaring portable deserialize sizing 
before `readSafe`
   - add native `bitmapaggregator` support for Delta row-index aggregation
   - wire the aggregate name through Gluten expression/substrait planning
   - add focused native tests for bitmap serialization/deserialization and 
aggregate behavior
   - add `delta_bitmap_benchmark` with construction, partial-merge, and 
deserialize/probe cases
   
   This PR is intentionally primitive-only:
   
   - no DELETE command routing changes
   - no DML row-index scan planning changes
   - no plain Parquet target scan optimization
   - no native bitmap aggregation enabled as the default DELETE path
   
   Those pieces remain in follow-up split PRs after the primitive and benchmark 
shape are reviewed.
   
   How was this patch tested?
   
   Post-rebase validation on top of current `upstream/main` 
(`33be6fb8bf703ac16eae3c75efa919a97d9cdf5a`):
   
   - `git diff --check upstream/main...HEAD`
   - `env 
JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn test-compile -pl 
backends-velox -am -Pjava-17,spark-3.5,backends-velox,hadoop-3.3,spark-ut,delta 
-DskipTests`
   - `env 
JAVA_HOME=/opt/homebrew/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home 
PATH=/opt/homebrew/opt/openjdk@17/bin:$PATH ./build/mvn test-compile -pl 
backends-velox -am 
-Pjava-17,spark-4.0,scala-2.13,backends-velox,hadoop-3.3,spark-ut,delta 
-DskipTests`
   
   Focused standalone native validation from the same diff before the final 
rebase:
   
   - standalone `RoaringBitmapArrayTest`: passed all 9 focused tests
   - Delta JVM compatibility: JVM-generated sparse-gap portable fixture for 
values `1`, `7`, and `1 << 33` is read by native code; native compact portable 
payload for the same values is read by a Delta 3.3.2 JVM helper with 
cardinality `3`, all expected contains checks, and last value `8589934592`
   - standalone `delta_bitmap_benchmark` construction/merge output: 
`/tmp/delta_bitmap_benchmark_delete_construction.json`
   - standalone `delta_bitmap_benchmark` read/probe output: 
`/tmp/delta_bitmap_benchmark_read_probe.json`
   
   Benchmark highlights from the standalone run:
   
   - contiguous 1M build+serialize: `7.91 ms`, `132.5M rows/s`
   - sparse 1M build+serialize: `9.99 ms`, `105.0M rows/s`
   - clustered 1M build+serialize: `10.10 ms`, `103.9M rows/s`
   - multi-bucket 256K build+serialize: `2.28 ms`, `114.9M rows/s`
   - sparse 1M merge from 64 partials: `1.12 ms`
   - contiguous round-robin merge from 64 partials: `1.32 ms`
   - sparse deserialize+probe: `487 us` for an 8,192-probe sample
   
   Notes:
   
   - Normal local Gluten C++ target validation is currently blocked by local 
Velox/build-tree setup issues, so this draft PR is opened to get the regular 
native CI signal.
   - `clang-format` was not available on this local machine after the final 
rebase; C++ format CI should validate formatting.
   
   Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: IBM BOB
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to