[jira] [Commented] (ARROW-14209) [R] Allow multiple arguments to n_distinct()

2022-09-28 Thread Todd Farmer (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610650#comment-17610650
 ] 

Todd Farmer commented on ARROW-14209:
-

This issue was last updated over 90 days ago, which may be an indication it is 
no longer being actively worked. To better reflect the current state, the issue 
is being unassigned per [project 
policy|https://arrow.apache.org/docs/dev/developers/bug_reports.html#issue-assignment].
 Please feel free to re-take assignment of the issue if it is being actively 
worked, or if you plan to start that work soon.

> [R] Allow multiple arguments to n_distinct()
> 
>
> Key: ARROW-14209
> URL: https://issues.apache.org/jira/browse/ARROW-14209
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Assignee: Dragoș Moldovan-Grünfeld
>Priority: Major
>
> ARROW-13620 and ARROW-14036 added support for the {{n_distinct()}} function 
> in the dplyr verb {{summarise()}} but only with a single argument. Add 
> support for multiple arguments to {{n_distinct()}}. This should return the 
> number of unique combinations of values in the specified columns/expressions.
> See the comment about this here: 
> [https://github.com/apache/arrow/pull/11257#discussion_r720873549]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-14209) [R] Allow multiple arguments to n_distinct()

2021-12-17 Thread Dewey Dunnington (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17461541#comment-17461541
 ] 

Dewey Dunnington commented on ARROW-14209:
--

Where {{n_distinct()}} binding is: 
https://github.com/apache/arrow/blob/6e20c6b9d7131af41f2e979529d06e507c731373/r/R/dplyr-functions.R#L1091-L1097

Reprex:

{code:R}
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

record_batch(
  a = c(1, 1, 2, 2, 1, NA, NA),
  b = c("a", "b", "c", "c", "a", "b", NA)
) %>% 
  summarise(
distinct_vals_with_na = n_distinct(a, b, na.rm = FALSE),
distinct_vals = n_distinct(a, b, na.rm = TRUE)
  )
#> Warning: Error : In n_distinct(a, b, na.rm = FALSE), Multiple arguments to
#> n_distinct() not supported in Arrow; pulling data into R
#> # A tibble: 1 × 2
#>   distinct_vals_with_na distinct_vals
#>
#> 1 5 3
{code}


> [R] Allow multiple arguments to n_distinct()
> 
>
> Key: ARROW-14209
> URL: https://issues.apache.org/jira/browse/ARROW-14209
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: R
>Reporter: Ian Cook
>Priority: Major
> Fix For: 7.0.0
>
>
> ARROW-13620 and ARROW-14036 added support for the {{n_distinct()}} function 
> in the dplyr verb {{summarise()}} but only with a single argument. Add 
> support for multiple arguments to {{n_distinct()}}. This should return the 
> number of unique combinations of values in the specified columns/expressions.
> See the comment about this here: 
> [https://github.com/apache/arrow/pull/11257#discussion_r720873549]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)