mbutrovich commented on PR #3703:
URL: 
https://github.com/apache/datafusion-comet/pull/3703#issuecomment-4070359121

   @andygrove helped me out and ran TPC-H SF1000. We saw the biggest wins from 
TPC-H Q2, Q18, and Q20 where operators emitted many small batches (likely 
joins). Why those weren't being coalesced within a partition, I am not sure 
right now. However, the wins were huge in some of these degenerate cases. For 
example, current behavior on `main` branch, highlighting one 
`CometBroadcastHashJoin` in TPC-H Q2:
   
   <img width="661" height="687" alt="Screenshot 2026-03-16 at 4 12 03 PM" 
src="https://github.com/user-attachments/assets/8afeee9b-9abd-4d6c-90ef-eddb19e8366d";
 />
   
   Compare to PR #3703:
   
   <img width="718" height="659" alt="Screenshot 2026-03-16 at 4 12 14 PM" 
src="https://github.com/user-attachments/assets/3fe14a92-18da-4707-a968-5048abc3f1b3";
 />
   
   I'll look into we're missing a coalescing opportunity at the output of the 
previous stage, since I'm a bit surprised at the behavior here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to