gitmodimo opened a new pull request, #47553:
URL: https://github.com/apache/arrow/pull/47553
### Rationale for this change
Tdigest algorithm enables merging multiple centroid sets that allow
efficient map-reduce implementation. This PR enables user to make reduce step
outside of aggregator and in result enables incremental tdigest calculation.
Also option to select different scaler function is added.
### What changes are included in this PR?
This change introduces 3 new aggregate functions:
`tdigest_map`- functionally the same as tdigest but instead of quantiles
outputs centroids_vector as fixed_size_list of length delta (with nullable
values) each element would be struct{double mean; double weight}
`tdigest_reduce` - function that takes vector of centroids_vectors and
output merged centroids_vector
`tdigest_quantile` - function that takes centroids_vector and calculates
quantiles as tdigest does
### Are these changes tested?
Yes
### Are there any user-facing changes?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]