Re: [PR] feat: Add experimental Grace Hash Join operator with spill-to-disk support [research, will not merge] [datafusion-comet]

via GitHub Sun, 22 Feb 2026 16:40:23 -0800


andygrove commented on PR #3564:
URL: 
https://github.com/apache/datafusion-comet/pull/3564#issuecomment-3942022715


     I have to give Claude Code a shoutout for tracking down this bug. It took 
many iterations on debugging.
     
   ```
     Root cause found and fixed: DataFusion's DataSourceExec always wraps 
output with BatchSplitStream, which slices batches larger than batch_size 
(default 8192 rows). In GHJ Phase 3:                     
                                                                                
                                                                                
                                             
     1. A build batch with 696,344 rows / 22 MB is passed through DataSourceExec
     2. BatchSplitStream slices it into 696,344 / 8192 = 85 slices
     3. Each slice shares the original Arrow buffers via zero-copy batch.slice()
     4. get_record_batch_memory_size() reports the full buffer size (~22 MB) 
for each slice
     5. collect_left_input calls try_grow(22 MB) 85 times → 1.87 GB phantom 
reservation
     6. The actual memory is only ~22 MB → 85x over-counting → spurious OOM
   
     Fix: Created context_without_batch_splitting() that produces a TaskContext 
with batch_size = usize::MAX, preventing BatchSplitStream from slicing. Applied 
to all 3 Phase 3 code paths:
     - Fast path (join_partition_recursive via fast path)
     - Recursive path (join_partition_recursive)
     - Spilled probe path (join_with_spilled_probe)
   
     You can verify by running TPC-DS q72 again. The build batch will now pass 
through as a single batch, and collect_left_input will correctly account for 
~22 MB (1 try_grow) instead of 1.87 GB (85
     try_grows).
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Add experimental Grace Hash Join operator with spill-to-disk support [research, will not merge] [datafusion-comet]

Reply via email to