Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

via GitHub Sat, 21 Jun 2025 19:41:11 -0700


UBarney commented on PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#issuecomment-2993893069


   > I'll find out why there is a performance improvement
   
   From the flame graph (when executing the SQL `select t1.value from 
range(8192) t1 join range(8192) t2 on t1.value + t2.value < t1.value * 
t2.value;`) by adjusting the input indices sizes, the execution time of these 
two functions was reduced
   
   - `apply_join_filter_to_indices` Showed a reduction in execution time 
(sample count reduced from 528million to 241million).
   - `build_batch_from_indices` (excluding the contribution of 
`apply_join_filter_to_indices`) Showed a reduction in execution time (sample 
count reduced from 79million to 35million).
   
   But I still can't explain why these two functions performed better. 😂
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

Reply via email to