[PR] feat: add AggregateMode::PartialReduce for tree-reduce aggregation [datafusion]

via GitHub Mon, 26 Jan 2026 22:36:46 -0800


njsmith opened a new pull request, #20019:
URL: https://github.com/apache/datafusion/pull/20019


   DataFusion's current `AggregateMode` enum has four variants covering three 
of the four cells in the input/output matrix:
   
                                     | Input: raw data          | Input: 
partial state
   Output: final values | `Single` / `SinglePartitioned` | `Final` / 
`FinalPartitioned`
   Output: partial state | `Partial`                   | ???
   
   This PR adds `AggregateMode::PartialReduce` to fill in the missing cell: it 
takes partially-reduced values as input, and reduces them further, but without 
finalizing.
   
   This is useful because it's the key component needed to implement 
distributed tree-reduction (as seen in e.g. the Scuba or Honeycomb papers): a 
set of worker nodes each perform multithreaded `Partial` aggregations, feed 
those into a `PartialReduce` to reduce all of this node's values into a single 
row, and then a head node collects the outputs from all nodes' `PartialReduce` 
to feed into a `Final` reduction.
   
   PR can be reviewed commit by commit: first commit is pure 
refactor/simplification; most places we were matching on `AggregateMode` we 
were actually just trying to either check which row of the above table we were 
in, or else which column. So now we have `is_first_stage` (tells you which 
column) and `is_last_stage` (tells you which row) and we use them everywhere.
   
   Second commit adds `PartialReduce`, and is pretty small because 
`is_first_stage`/`is_last_stage` do most of the heavy lifting. It also adds a 
test demonstrating a minimal Partial -> PartialReduce -> Final tree-reduction.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] feat: add AggregateMode::PartialReduce for tree-reduce aggregation [datafusion]

Reply via email to