andygrove opened a new pull request, #3410: URL: https://github.com/apache/datafusion-comet/pull/3410
## Summary - Eliminates the JVM data round trip for data columns in `native_iceberg_compat` V1 scans - Data columns are read directly from the native `BatchContext` via zero-copy `Arc::clone` - Only partition columns (small, constant values) cross the JVM boundary via Arrow FFI - Reduces 3 copy steps to 1 for data columns (the `currentColumnBatch` JNI export remains; the `exportBatch` FFI round-trip and `copy_array` deep copy are eliminated) ### How it works When `native_batch_passthrough` is enabled in the Scan protobuf (auto-detected for `native_iceberg_compat` CometScanExec): 1. `NativeBatchReader.nextBatch()` reads the batch natively and sets a ThreadLocal handle 2. Rust `ScanExec.get_next_passthrough()` calls `CometBatchIterator.advancePassthrough()` instead of the normal `hasNext()`+`next()` path 3. Data columns are obtained via `Arc::clone` from `BatchContext.current_batch` (zero-copy) 4. Only partition columns are imported from JVM via FFI and deep-copied (they are small constant values) ### Files changed - **operator.proto**: Added `native_batch_passthrough` and `num_data_columns` fields to `Scan` message - **NativeBatchReader.java**: Added `CURRENT_READER_HANDLE` ThreadLocal, set after each `loadNextBatch()` - **CometBatchIterator.java**: Added `advancePassthrough()` and `nextPartitionColumnsOnly()` methods - **batch_iterator.rs**: JNI method bindings for the new Java methods - **scan.rs**: Added `get_next_passthrough()` that reads data cols from BatchContext (zero-copy) - **planner.rs**: Passes new fields to `ScanExec::new()` - **CometSink.scala**: Detects `native_iceberg_compat` scans and sets passthrough fields - **mod.rs**: Made `BatchContext` and `get_batch_context` public ## Test plan - [x] `ParquetReadV1Suite` - all 88 tests pass - [x] `ParquetReadV2Suite` - all tests pass - [x] Partition-specific tests pass (6/6) - [ ] Run benchmark to measure performance improvement 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
