Rachelint commented on issue #11680: URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2328012643
> > It seems the cpu cost about RepartitionExec and CoalesceBatchesExec is not the bottleleck for the Q32 > > What @jayzhan211 experiments and shows the effects of single aggregate performance benefits in #11762 and #11777 is on Clickbench Q17/Q18 instead of Q32. > > As of today, I see that Q32 performance is comparable to that in DuckDB on an M3 Mac. > > ``` > # DuckDB Q32: > 0.5369797919993289 > 0.44854350000969134 > 0.41927954100538045 > > # DataFusion main(780cccb52) > 0.620 > 0.400 > 0.409 > ``` > > But for Q17, we are still behind: > > ``` > # DuckDB > 0.5953990409907419 > 0.5309897500119405 > 0.5242392499931157 > > # DataFusion main(780cccb52) > 1.145 > 1.072 > 1.082 > ``` > > We would probably need to consolidate Aggregate(Partial and Final) and Repartition into a single place in order to be able to adaptively choose aggregate mode/algorithm based on runtime statistics. I see the improvement about q32 in later pr #11792, and I guess the reason why performance improved may be simlar as the partial skipping? Maybe q17/q18 are improved due to different reason with q32? I agree maybe we should perform the similar mechanism about select the merging mode dynamicly like `duckdb`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
