alamb opened a new issue, #8793: URL: https://github.com/apache/arrow-datafusion/issues/8793
### Is your feature request related to a problem or challenge? Many of DataFusion's built in accumlators use the [`GroupsAccumulator`](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.GroupsAccumulator.html) interface for a significant performance improvement See the blog post XXX for more details However, it is not currently possible to use [`GroupsAccumulator`] with user defined Aggregate functions , which is less than ideal for 2 reasons: 1. User's can't take advantage of the faster interface 2. We can't port built in Aggregate functions to use the same API https://github.com/apache/arrow-datafusion/issues/8708 ### Describe the solution you'd like I would like to extend the `AggregateUDFImpl` trait added by @guojidan in https://github.com/apache/arrow-datafusion/pull/8733 with a new method that returns a GroupsAccumulator 1. Add new methods to `AggregateUDFImpl` 2. Wire in the the call into the impl for `AggregateExpr` https://github.com/apache/arrow-datafusion/blob/dd58c5a7b0069e3bade997abad9ec6bed8c8a1c3/datafusion/physical-plan/src/udaf.rs#L75 3. Update `advanced_udaf.rs` example to create a `GroupsAccumulator` ## Adding new methods to `AggregateUDFImpl`: Following the model of [`PartitionEvaluator`](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.PartitionEvaluator.html) and [groups_accumulator_supported](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.AggregateExpr.html#method.groups_accumulator_supported) ```rust pub trait AggregateUDFImpl { ... /// If the aggregate expression has a specialized /// [`GroupsAccumulator`] implementation. If this returns true, /// `[Self::create_groups_accumulator`] will be called. fn groups_accumulator_supported(&self) -> bool { false } /// Return a specialized [`GroupsAccumulator`] that manages state /// for all groups. /// /// For maximum performance, a [`GroupsAccumulator`] should be /// implemented in addition to [`Accumulator`]. fn create_groups_accumulator(&self) -> Result<Box<dyn GroupsAccumulator>> { not_impl_err!("GroupsAccumulator hasn't been implemented for {self:?} yet") } } ``` ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
