realno commented on pull request #1525: URL: https://github.com/apache/arrow-datafusion/pull/1525#issuecomment-1008337842
> > I want to discuss the aggregator interface a bit further. The update function doesn't seem to provide much value with the current architecture, it seems under all circumstances batch_update will perform better. Would it be a reasonable choice to just keep batch_update (and remove update)? This way we can get rid a lot of code dealing with record level ScalarValue arithmetic. IMO it only makes sense to differentiate the two if update can provide additional value such as lower latency in streaming context. > > I another other rationale for `update` is to lower the barrier to entry for new users to write `Accumulator`s (otherwise you have to understand Arrow arrays and compute kernels, etc to write one), as well as being backwards compatible with the original interface > > Perhaps we could add some comments to `update` clarifying that for aggregators should prefer to implement `batch_update` and `update` is present only for backwards compatibility / ease of use? I see, that makes sense. It is a good idea to add some comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
