alamb opened a new issue, #8793:
URL: https://github.com/apache/arrow-datafusion/issues/8793

   ### Is your feature request related to a problem or challenge?
   
   Many of DataFusion's built in accumlators use the 
[`GroupsAccumulator`](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.GroupsAccumulator.html)
 interface for a significant performance improvement
   
   See the blog post XXX for more details
   
   However, it is not currently possible to use [`GroupsAccumulator`] with user 
defined Aggregate functions , which is less than ideal for 2 reasons:
   1. User's can't take advantage of the faster interface
   2. We can't port built in Aggregate functions to use the same API 
https://github.com/apache/arrow-datafusion/issues/8708 
   
   
   
   
   ### Describe the solution you'd like
   
   I would like to extend the `AggregateUDFImpl` trait added by @guojidan  in 
https://github.com/apache/arrow-datafusion/pull/8733 with a new method that 
returns a GroupsAccumulator
   
   1. Add new methods to `AggregateUDFImpl`
   2. Wire in the the call into the impl for `AggregateExpr` 
https://github.com/apache/arrow-datafusion/blob/dd58c5a7b0069e3bade997abad9ec6bed8c8a1c3/datafusion/physical-plan/src/udaf.rs#L75
   3. Update `advanced_udaf.rs` example to create a `GroupsAccumulator`
   
   ## Adding new methods to `AggregateUDFImpl`:
   
   Following the model of 
[`PartitionEvaluator`](https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.PartitionEvaluator.html)
 and 
[groups_accumulator_supported](https://docs.rs/datafusion/latest/datafusion/physical_expr/trait.AggregateExpr.html#method.groups_accumulator_supported)
   
   ```rust
   pub trait AggregateUDFImpl {
   ...
       /// If the aggregate expression has a specialized
       /// [`GroupsAccumulator`] implementation. If this returns true,
       /// `[Self::create_groups_accumulator`] will be called.
       fn groups_accumulator_supported(&self) -> bool {
           false
       }
   
      /// Return a specialized [`GroupsAccumulator`] that manages state
       /// for all groups.
       ///
       /// For maximum performance, a [`GroupsAccumulator`] should be
       /// implemented in addition to [`Accumulator`].
       fn create_groups_accumulator(&self) -> Result<Box<dyn 
GroupsAccumulator>> {
           not_impl_err!("GroupsAccumulator hasn't been implemented for 
{self:?} yet")
       }
   
   }
   ```
   
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to