malinjawi opened a new pull request, #11900: URL: https://github.com/apache/gluten/pull/11900
## What changes are proposed in this pull request? This PR adds the native Delta deletion vector scan foundation for the Velox backend in Gluten. The architecture is based of the design of Delta Lake's: [Deletion Vectors High Level Design](https://docs.google.com/document/d/1lv35ZPfioopBbzQ7zT82LOev7qV7x4YNLkMr2-L5E_M/edit?tab=t.0#heading=h.z89r7ifgftsi). The scope is intentionally limited to the read path. It introduces the JVM and native plumbing required to read Delta tables with deletion vectors natively, while keeping DML and DV write-path work out of scope for a follow-up PR. The main changes are: - Add JVM-side Delta scan preparation and metadata normalization for DV-aware scans. - Add native Delta scan/read components under `cpp/velox/compute/delta`, including: - Delta connector and datasource plumbing - Delta split representation - deletion vector reader - split reader integration - UUID / Base85 utilities for Delta DV handling - Add explicit Roaring dependency wiring and a local `RoaringBitmapArray` helper needed by the native DV read path and tests. - Wire the Delta read path into the Velox backend and scan execution flow. - Add focused native tests for: - Delta connector - deletion vector reader - split handling - UUID utilities This PR does not include: - row-index finder integration - DELETE / UPDATE / MERGE DV write-path changes - compaction / utility command support ## How was this patch tested? - Added native tests under `cpp/velox/compute/delta/tests`: - `DeltaConnectorTest` - `DeltaDeletionVectorReaderTest` - `DeltaSplitTest` - `DeltaUuidUtilsTest` - Locally validated that the new Delta read sources are integrated into the native `velox` target and compile through the new Delta translation units. - Locally validated explicit Roaring discovery and removed unresolved CRoaring symbols after wiring the dependency. ## Was this patch authored or co-authored using generative AI tooling? Co-authored: IBM Bob -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
