Re: [I] Fusing partial aggregation with repartition [datafusion]

via GitHub Tue, 24 Sep 2024 19:59:22 -0700


Rachelint commented on issue #12596:
URL: https://github.com/apache/datafusion/issues/12596#issuecomment-2372803765


   > > Introduce the partitioned hashtable in partial aggregation, and we 
partition the datafusion before inserting them into hashtable.
   > > And we push them into final aggregation partition by partition after, 
rather than split them again in repartition, and merge them again in coalesce.
   > 
   > I'm not clear on how this proposal works. Could you please explain why it 
provides performance benefits compared to partial aggregation, exchange, and 
final aggregation? Is the proposal aimed explicitly at accelerating high 
cardinality aggregation, or is it intended to enhance aggregation performance?
   
   I think it enhances aggregation performance generally?
   
   - Currently we can think `GroupValues` and `GroupAccumulator` uses a single 
`Vec` to manage intermediate states in `partial aggr`.
   - After finishing work in `partial aggr`, we pass the `batch` to `exchange`, 
then we recompute the `hashes` of `batch`. Actually the `hashes` have been 
computed in `GroupValues`, the this recomputing is `the first avoidable cpu 
cost`.
   - Then we split the `batch` to multiple `batches`, according to the 
`partition nubmers` computed from `hashes`. The splitting needs to creating 
multiple new `batches` to hold the values from the source `batch`, and need to 
copy data into them, and that is `the second avoidable cpu cost`.
   - Finally, before passing data to `final aggr` of the partition, we need to 
copy the splitted small `batches` of the partition to the `coalesce` firstly, 
until the buffer large enough (usually the default batch size 8192), and that 
is `the third avoidable cpu cost`.
   
   After using partitioned approach in `GroupValues` and `GroupAccumulator`:
   - We can naturally reuse the computed `hashes` in `GroupValues` when we 
calculating the `partition numbers` of the `batches`.
   - We store the intermediate states in `partial aggr` partition by partition. 
And we when we submit them to `final aggr`, we just submit them partition by 
partition, rather than splitting first and merging after.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Re: [I] Fusing partial aggregation with repartition [datafusion]

Reply via email to