ianmcook commented on a change in pull request #9875:
URL: https://github.com/apache/arrow/pull/9875#discussion_r607349708
##########
File path: r/R/compute.R
##########
@@ -80,6 +80,36 @@ collect_arrays_from_dots <- function(dots) {
ChunkedArray$create(!!!arrays)
}
+#' @export
+quantile.ArrowDatum <- function(x,
+ probs = seq(0, 1, 0.25),
+ na.rm = FALSE,
+ interpolation = c("linear", "lower", "higher",
"nearest", "midpoint"),
+ ...) {
Review comment:
In 188ceea3fbc037c3944ac936e9492f9f388f3c95, I added an error if the
user specifies a non-default value for `type`
I read through the `quantile` docs and the associated Rob Hyndman paper, but
most of the sample quantile types described there are quite different from any
of the options implemented in Arrow, so that did not help much. Doing
quantitative comparisons was more fruitful. Here's what I found:
- The R default `type = 7` corresponds most closely to the Arrow default
`interpolation = "linear"` (which is very good)
- R's `type = 7` seems to correspond closely to Arrow's `interpolation =
"lower"`
- None of the other R `type` options and Arrow `interpolation` options
exhibit any close correspondence
Based on these findings, and considering how few users will likely attempt
to compute quantiles with Arrow using anything but the default type, I don't
think there's any immediate action to take here, and I don't really think any
follow-up is needed. It seems like a long shot that we would ever implement any
of these other quantile algorithms in the C++ library, but I'll open a Jira for
that if you think it's worth it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]