2010YOUY01 commented on PR #17482: URL: https://github.com/apache/datafusion/pull/17482#issuecomment-3305376950
> The performance saw a similar hit in #16660 (the benchmark is in the description). I think I can tune when to use this join based on the incoming size in a follow up, for now the config will restrain this join to keep it purely experimental That bench doesn't include sort time, PWMJ should be faster than NLJ even it includes the sorting overhead (n*log(n) v.s. n^2). I think the main motivation for adding this executor is its performance advantage, so we probably shouldn’t merge an initial PR without first getting it to a good performance level. (Also, since there aren’t many merge conflicts to resolve for this PR, I don’t think there’s any rush.) I can help diagnose it later. > How did you get the incorrect result? I'm testing query 12 and it doesnt optimize into a piecewisemergejoin `set datafusion.execution.target_partitions = 1;` This config should get PWMJ triggered Moreover we should get sqlite extended test passed later, either through configuring `target_partition` to 1 or enable parallel execution for it (BTW why are't we support large `target_partitions` now? it seen not requiring lots of change, only the stream side have to be round-robin repartitioned) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
