andygrove opened a new pull request, #3410:
URL: https://github.com/apache/datafusion-comet/pull/3410

   ## Summary
   
   - Eliminates the JVM data round trip for data columns in 
`native_iceberg_compat` V1 scans
   - Data columns are read directly from the native `BatchContext` via 
zero-copy `Arc::clone`
   - Only partition columns (small, constant values) cross the JVM boundary via 
Arrow FFI
   - Reduces 3 copy steps to 1 for data columns (the `currentColumnBatch` JNI 
export remains; the `exportBatch` FFI round-trip and `copy_array` deep copy are 
eliminated)
   
   ### How it works
   
   When `native_batch_passthrough` is enabled in the Scan protobuf 
(auto-detected for `native_iceberg_compat` CometScanExec):
   
   1. `NativeBatchReader.nextBatch()` reads the batch natively and sets a 
ThreadLocal handle
   2. Rust `ScanExec.get_next_passthrough()` calls 
`CometBatchIterator.advancePassthrough()` instead of the normal 
`hasNext()`+`next()` path
   3. Data columns are obtained via `Arc::clone` from 
`BatchContext.current_batch` (zero-copy)
   4. Only partition columns are imported from JVM via FFI and deep-copied 
(they are small constant values)
   
   ### Files changed
   
   - **operator.proto**: Added `native_batch_passthrough` and 
`num_data_columns` fields to `Scan` message
   - **NativeBatchReader.java**: Added `CURRENT_READER_HANDLE` ThreadLocal, set 
after each `loadNextBatch()`
   - **CometBatchIterator.java**: Added `advancePassthrough()` and 
`nextPartitionColumnsOnly()` methods
   - **batch_iterator.rs**: JNI method bindings for the new Java methods
   - **scan.rs**: Added `get_next_passthrough()` that reads data cols from 
BatchContext (zero-copy)
   - **planner.rs**: Passes new fields to `ScanExec::new()`
   - **CometSink.scala**: Detects `native_iceberg_compat` scans and sets 
passthrough fields
   - **mod.rs**: Made `BatchContext` and `get_batch_context` public
   
   ## Test plan
   
   - [x] `ParquetReadV1Suite` - all 88 tests pass
   - [x] `ParquetReadV2Suite` - all tests pass
   - [x] Partition-specific tests pass (6/6)
   - [ ] Run benchmark to measure performance improvement
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to