RNKuhns opened a new issue, #15103:
URL: https://github.com/apache/arrow/issues/15103

   ### Describe the enhancement requested
   
   Arrow’s compute functions currently include several aggregate statistics 
(mean, sum, variance, etc).
   
   It would be great to offer weighted versions of several of these (mean, sum, 
count, variance and standard deviation in short run, quantile in longer-run) 
and expose them to pyarrow, R, etc. This would allow this functionality to be 
pushed down to Arrow and Arrow datasets. For example, an R user would have to 
either collect the data first and apply something like weighted.mean from R 
stats or code up custom logic using functionality available in Arrow to arrive 
at a similar result. Eother is not ideal for a relatively routine aggregation 
(weighted statistics). It would also make it much easier to calculate weighted 
statistics when working with distributed Arrow datasets in Python.
   
   Note that this is not covered by UDF API directly which doesn’t support 
aggregate functions.
   
   The added functionality could be through the creation of functions with a 
signature like: weighted_mean(x: arrow array, weights: arrow array) -> scalar
   
   
   ### Component(s)
   
   C++, Python, R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to