Re: [I] Possible data corruption in "Skipping partial aggregation" change [datafusion]

via GitHub Wed, 07 Aug 2024 05:52:52 -0700


andygrove commented on issue #11850:
URL: https://github.com/apache/datafusion/issues/11850#issuecomment-2273402515


   @alamb Sure, here is one of the query stages after we have translated it to 
a DataFusion plan. Note that we are performing a join on the output of two 
partial aggregates and then applying the final aggregate after the join. Having 
duplicates on either input to the join causes extra rows to be generated in the 
join output.
   
   Perhaps we'll need to start thinking about having a physical optimizer phase 
in Comet so that we can leverage the "skip partial aggregates" feature in some 
cases.
   
   ```
    ProjectionExec: expr=[sum@0 as col_0, sum@1 as col_1, sum@2 as col_2]
     AggregateExec: mode=Final, gby=[], aggr=[sum, sum, sum]
       AggregateExec: mode=Partial, gby=[], aggr=[sum, sum, sum]
         ProjectionExec: expr=[col_0@0 as col_0, col_0@2 as col_1]
           SortMergeJoin: join_type=Full, on=[(col_0@0, col_0@0), (col_1@1, 
col_1@1)]
             SortExec: expr=[col_0@0 ASC,col_1@1 ASC], 
preserve_partitioning=[false]
               CopyExec
                 ProjectionExec: expr=[col_0@0 as col_0, col_1@1 as col_1]
                   AggregateExec: mode=Partial, gby=[col_0@0 as col_0, col_1@1 
as col_1], aggr=[]
                     ScanExec: schema=[col_0: Int32, col_1: Int32]
             SortExec: expr=[col_0@0 ASC,col_1@1 ASC], 
preserve_partitioning=[false]
               CopyExec
                 ProjectionExec: expr=[col_0@0 as col_0, col_1@1 as col_1]
                   AggregateExec: mode=Partial, gby=[col_0@0 as col_0, col_1@1 
as col_1], aggr=[]
                     ScanExec: schema=[col_0: Int32, col_1: Int32]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Possible data corruption in "Skipping partial aggregation" change [datafusion]

Reply via email to