Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

via GitHub Thu, 10 Jul 2025 20:58:10 -0700


2010YOUY01 commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2199484385



##########
datafusion/physical-plan/src/joins/utils.rs:
##########
@@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices(
     probe_indices: UInt32Array,
     filter: &JoinFilter,
     build_side: JoinSide,
+    max_intermediate_size: Option<usize>,
 ) -> Result<(UInt64Array, UInt32Array)> {
     if build_indices.is_empty() && probe_indices.is_empty() {
         return Ok((build_indices, probe_indices));
     };
 
-    let intermediate_batch = build_batch_from_indices(
-        filter.schema(),
-        build_input_buffer,
-        probe_batch,
-        &build_indices,
-        &probe_indices,
-        filter.column_indices(),
-        build_side,
-    )?;
-    let filter_result = filter
-        .expression()
-        .evaluate(&intermediate_batch)?
-        .into_array(intermediate_batch.num_rows())?;
+    let filter_result = if let Some(max_size) = max_intermediate_size {
+        let mut filter_results =
+            Vec::with_capacity(build_indices.len().div_ceil(max_size));
+
+        for i in (0..build_indices.len()).step_by(max_size) {
+            let end = min(build_indices.len(), i + max_size);
+            let len = end - i;
+            let intermediate_batch = build_batch_from_indices(
+                filter.schema(),
+                build_input_buffer,
+                probe_batch,
+                &build_indices.slice(i, len),
+                &probe_indices.slice(i, len),
+                filter.column_indices(),
+                build_side,
+            )?;
+            let filter_result = filter
+                .expression()
+                .evaluate(&intermediate_batch)?
+                .into_array(intermediate_batch.num_rows())?;
+            filter_results.push(filter_result);

Review Comment:
   > I'm not sure I fully understand your suggestions. Could you please 
elaborate?
   > 
   > > What about executing the filter directly on build/probe indices slice 
here and concatenating the indices later?
   > 
   > I'm a bit confused by this. Do you mean we can avoid constructing the 
`intermediate_batch`? It seems this approach would require rewriting the filter 
logic to work directly on index slices
   > 
   > > and using coalescing kernel
   > 
   > I'm also not sure why the `coalesce` kernel should be used here. The 
current function takes `build_indices` and `probe_indices`, builds an 
`intermediate_batch`, executes the filter, and returns the `(build_indices, 
probe_indices)` that passed the filter.
   > 
   > My understanding is that `coalesce` is used to merge multiple 
`RecordBatch`es with a row count less than a target into new `RecordBatch`es 
with a row count greater than the target. How would that apply in this 
situation?
   
   This is an idea for future optimization:
   If we use this interface 
https://docs.rs/arrow-select/55.2.0/src/arrow_select/coalesce.rs.html#189-193 
instead of the current `concat_batches()` approach, it can be 1. use less 
memory 2. faster
   Now the above interface hasn't been implemented with the fast path yet.
   
   You can see the motivations in https://github.com/apache/arrow-rs/issues/6692



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

Reply via email to