Re: [I] Implement cache-efficient partial aggregation by Leis et al [datafusion]

via GitHub Sat, 07 Mar 2026 12:27:41 -0800


Rachelint commented on issue #20773:
URL: https://github.com/apache/datafusion/issues/20773#issuecomment-4017282421


   My concerns are that:
   - The paper describe about how to keep the hashtable small and 
cache-efficient in high cardinality case
   - And situation seems in datafuson:
     - In `high cardinality` case, `datafusion` use `skip aggregations 
approach` to even totally skip the `partial aggr`, and skipping possible to be 
a better approach(`doris` use the similar approach).
     - In `low cardinality` case, the hashtable usually small enough without 
extra processing?
   
   > The description in DuckDB blog sounds to me very similar/the same as the 
paper, "spills when ht becomes full" means there is some threshold on the 
hashtable, after which it will repartition/spill to the local hash tables.
   
   Method in [Leis et 
al](https://15721.courses.cs.cmu.edu/spring2016/papers/p743-leis.pdf) may be 
more similar as current duckdb(the approach is mentioned together with their 
external aggregtion in https://duckdb.org/2024/03/29/external-aggregation), it 
is different with duckdb 0.7:
   - Only one small enough fixed hashtable in `partial aggr` thread, and only 
partition the group values
   - When hashtable full, flush the partitioned group values
   - Clear and reuse the hashtable
   - And in `final aggr`, what `current duckdb` do is similar as `datafusion`, 
always perform partition-wise parallel merge (`duckdb 0.7` will perform single 
thread merge when groups are few)
   
   As I see, `making partial aggr cache-friendly` mentioned in paper(and 
current duckdb)  is actually `making the hashtable in partial aggr 
cache-friendly`.
   And the paper and current duckdb use reset + rebuild method to reach it even 
in `high cardinality` case.
   
   Duckdb 0.7 ([method in the 
blog](https://duckdb.org/2022/03/07/aggregate-hashtable#parallel-aggregation) 
seems to only describe how they use partitioned hashtable to support parallel 
merge in `final aggr`. And actually partitioned hashtable help few 
with`cache-friendly` due to some experiment I did before.
   
   > The problem with spilling to repartition / final aggregation 
(https://github.com/apache/datafusion/issues/6937#issuecomment-1681310199) is 
that it will reduce keys less, and thus cause much more cross communication / 
repartition overhead and cost in the upstream aggregation.
   
   Agree with "flush" is too expansive here, that possible lead to no obvious 
improvement. 
   It may be not the good case to measure if the method in [Leis et 
al](https://15721.courses.cs.cmu.edu/spring2016/papers/p743-leis.pdf) can also 
work well in `datafusion`.
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Implement cache-efficient partial aggregation by Leis et al [datafusion]

Reply via email to