alamb commented on pull request #1525:
URL: 
https://github.com/apache/arrow-datafusion/pull/1525#issuecomment-1008307171


   > I want to discuss the aggregator interface a bit further. The update 
function doesn't seem to provide much value with the current architecture, it 
seems under all circumstances batch_update will perform better. Would it be a 
reasonable choice to just keep batch_update (and remove update)? This way we 
can get rid a lot of code dealing with record level ScalarValue arithmetic. IMO 
it only makes sense to differentiate the two if update can provide additional 
value such as lower latency in streaming context.
   
   I another other rationale for `update` is to lower the barrier to entry for 
new users to write `Accumulator`s  (otherwise you have to understand Arrow 
arrays and compute kernels, etc to write one), as well as being backwards 
compatible with the original interface
   
   Perhaps we could add some comments to `update` clarifying that for 
aggregators should prefer to implement `batch_update` and `update` is present 
only for backwards compatibility / ease of use?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to