malinjawi opened a new pull request, #11900:
URL: https://github.com/apache/gluten/pull/11900

   ## What changes are proposed in this pull request?
   
   This PR adds the native Delta deletion vector scan foundation for the Velox 
backend in Gluten.
   
   The architecture is based of the design of Delta Lake's: [Deletion Vectors 
High Level 
Design](https://docs.google.com/document/d/1lv35ZPfioopBbzQ7zT82LOev7qV7x4YNLkMr2-L5E_M/edit?tab=t.0#heading=h.z89r7ifgftsi).
   
   
   The scope is intentionally limited to the read path. It introduces the JVM 
and native plumbing required to read Delta tables with deletion vectors 
natively, while keeping DML and DV write-path work out of scope for a follow-up 
PR.
   
   The main changes are:
   
   - Add JVM-side Delta scan preparation and metadata normalization for 
DV-aware scans.
   - Add native Delta scan/read components under `cpp/velox/compute/delta`, 
including:
     - Delta connector and datasource plumbing
     - Delta split representation
     - deletion vector reader
     - split reader integration
     - UUID / Base85 utilities for Delta DV handling
   - Add explicit Roaring dependency wiring and a local `RoaringBitmapArray` 
helper needed by the native DV read path and tests.
   - Wire the Delta read path into the Velox backend and scan execution flow.
   - Add focused native tests for:
     - Delta connector
     - deletion vector reader
     - split handling
     - UUID utilities
   
   This PR does not include:
   - row-index finder integration
   - DELETE / UPDATE / MERGE DV write-path changes
   - compaction / utility command support
   
   ## How was this patch tested?
   - Added native tests under `cpp/velox/compute/delta/tests`:
     - `DeltaConnectorTest`
     - `DeltaDeletionVectorReaderTest`
     - `DeltaSplitTest`
     - `DeltaUuidUtilsTest`
   - Locally validated that the new Delta read sources are integrated into the 
native `velox` target and compile through the new Delta translation units.
   - Locally validated explicit Roaring discovery and removed unresolved 
CRoaring symbols after wiring the dependency.
   
   
   
   ## Was this patch authored or co-authored using generative AI tooling?
   
   Co-authored: IBM Bob
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to