Re: [I] Improve Aggregate with Limit [datafusion]

via GitHub Wed, 15 Jan 2025 15:13:05 -0800


ctsk commented on issue #13729:
URL: https://github.com/apache/datafusion/issues/13729#issuecomment-2594124886


   I checked out your second point and was able to replicate the difference in 
speed between the two queries. I don't yet see how a TopK operator would help 
the non-ordered query.
   
   When comparing the query plans, it seems like the "limit" is not propagated 
to AggregateExec:
   
   Query with OrderBy:
   
                                                                                
                                                                                
                                                                                
             
   | SortPreservingMergeExec: [min(hits.AdvEngineID)@1 ASC NULLS LAST], 
fetch=10                                                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
   |   SortExec: TopK(fetch=10), expr=[min(hits.AdvEngineID)@1 ASC NULLS LAST], 
preserve_partitioning=[true]                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
   |     AggregateExec: mode=FinalPartitioned, gby=[UserID@0 as UserID], 
aggr=[min(hits.AdvEngineID)], lim=[10]                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
   |       CoalesceBatchesExec: target_batch_size=8192                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
   |         RepartitionExec: partitioning=Hash([UserID@0], 8), 
input_partitions=8                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
   |           AggregateExec: mode=Partial, gby=[UserID@0 as UserID], 
aggr=[min(hits.AdvEngineID)], lim=[10]                                          
                                                                                
                                                                                
                                                                                
                                                                   
   
   
   Query without OrderBy:
   | GlobalLimitExec: skip=0, fetch=10                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
   |   CoalescePartitionsExec                                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
   |     AggregateExec: mode=FinalPartitioned, gby=[UserID@0 as UserID], 
aggr=[min(hits.AdvEngineID)]                                                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
   |       CoalesceBatchesExec: target_batch_size=8192                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
   |         RepartitionExec: partitioning=Hash([UserID@0], 8), 
input_partitions=8                                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                          
   |           AggregateExec: mode=Partial, gby=[UserID@0 as UserID], 
aggr=[min(hits.AdvEngineID)] 
   
   
   After hackily adding the limit to the aggregation, the query was faster 
without the orderby (1.4 seconds vs 1.55 seconds).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Improve Aggregate with Limit [datafusion]

Reply via email to