Dandandan commented on PR #19761: URL: https://github.com/apache/datafusion/pull/19761#issuecomment-3842389940
> I guess the point is that this change may essentially be undoing https://github.com/apache/datafusion/pull/17452 and that the improvements really come from that? @LiaCastaneda and I have been talking about reworking https://github.com/apache/datafusion/pull/17452 so that the dynamic filter is updated as partitions complete, making it less necessary to wait. But yeah I think maybe we can make a branch that removes that synchronization point and see what the benchmark numbers look like? Yes I think buffering probe side will be partly be faster as it re-introduces parallelism that https://github.com/apache/datafusion/pull/17452 removed for `Partitioned` hash join. I think we either need to remove it (or change the approach to reintroduce the parallelism ) and check TPCH (not in memory as that will use mostly CollectLeft) and TPC-DS performance. @alamb FYI I think perhaps we need to consider benchmarkinging version against version (51 against 52) to catch potential regressions like this and probably add tpch and/or tpcds (the non-in-memory) to the default benchmarking suite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
