Re: [PR] feat: `ClassicJoin` for PWMJ [datafusion]

via GitHub Wed, 17 Sep 2025 21:27:00 -0700


2010YOUY01 commented on PR #17482:
URL: https://github.com/apache/datafusion/pull/17482#issuecomment-3305376950


   > The performance saw a similar hit in #16660 (the benchmark is in the 
description). I think I can tune when to use this join based on the incoming 
size in a follow up, for now the config will restrain this join to keep it 
purely experimental
   
   That bench doesn't include sort time, PWMJ should be faster than NLJ even it 
includes the sorting overhead (n*log(n) v.s. n^2).
   
   I think the main motivation for adding this executor is its performance 
advantage, so we probably shouldn’t merge an initial PR without first getting 
it to a good performance level. (Also, since there aren’t many merge conflicts 
to resolve for this PR, I don’t think there’s any rush.)
   
   I can help diagnose it later.
   
   > How did you get the incorrect result? I'm testing query 12 and it doesnt 
optimize into a piecewisemergejoin
   
   `set datafusion.execution.target_partitions = 1;` This config should get 
PWMJ triggered
   
   Moreover we should get sqlite extended test passed later, either through 
configuring `target_partition` to 1 or enable parallel execution for it (BTW 
why are't we support large `target_partitions` now? it seen not requiring lots 
of change, only the stream side have to be round-robin repartitioned)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: `ClassicJoin` for PWMJ [datafusion]

Reply via email to