[
https://issues.apache.org/jira/browse/ARROW-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461541#comment-17461541
]
Dewey Dunnington commented on ARROW-14209:
--
Where {{n_distinct()}} binding is:
https://github.com/apache/arrow/blob/6e20c6b9d7131af41f2e979529d06e507c731373/r/R/dplyr-functions.R#L1091-L1097
Reprex:
{code:R}
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)
record_batch(
a = c(1, 1, 2, 2, 1, NA, NA),
b = c("a", "b", "c", "c", "a", "b", NA)
) %>%
summarise(
distinct_vals_with_na = n_distinct(a, b, na.rm = FALSE),
distinct_vals = n_distinct(a, b, na.rm = TRUE)
)
#> Warning: Error : In n_distinct(a, b, na.rm = FALSE), Multiple arguments to
#> n_distinct() not supported in Arrow; pulling data into R
#> # A tibble: 1 × 2
#> distinct_vals_with_na distinct_vals
#>
#> 1 5 3
{code}
> [R] Allow multiple arguments to n_distinct()
>
>
> Key: ARROW-14209
> URL: https://issues.apache.org/jira/browse/ARROW-14209
> Project: Apache Arrow
> Issue Type: Improvement
> Components: R
>Reporter: Ian Cook
>Priority: Major
> Fix For: 7.0.0
>
>
> ARROW-13620 and ARROW-14036 added support for the {{n_distinct()}} function
> in the dplyr verb {{summarise()}} but only with a single argument. Add
> support for multiple arguments to {{n_distinct()}}. This should return the
> number of unique combinations of values in the specified columns/expressions.
> See the comment about this here:
> [https://github.com/apache/arrow/pull/11257#discussion_r720873549]
--
This message was sent by Atlassian Jira
(v8.20.1#820001)