[ https://issues.apache.org/jira/browse/ARROW-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-13434: ----------------------------------- Labels: pull-request-available (was: ) > [R] group_by() with an unnammed expression > ------------------------------------------ > > Key: ARROW-13434 > URL: https://issues.apache.org/jira/browse/ARROW-13434 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Jonathan Keane > Assignee: Jonathan Keane > Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > With dplyr, when we group_by with an unnamed expression, a column is added to > the dataframe that has the result of the expression. > {code} > > example_data %>% > + group_by(int < 4) %>% collect() > # A tibble: 10 x 8 > # Groups: int < 4 [3] > int dbl dbl2 lgl false chr fct `int < 4` > <int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl> > 1 1 1.1 5 TRUE FALSE a a TRUE > 2 2 2.1 5 NA FALSE b b TRUE > 3 3 3.1 5 TRUE FALSE c c TRUE > 4 NA 4.1 5 FALSE FALSE d d NA > 5 5 5.1 5 TRUE FALSE e NA FALSE > 6 6 6.1 5 NA FALSE NA NA FALSE > 7 7 7.1 5 NA FALSE g g FALSE > 8 8 8.1 5 FALSE FALSE h h FALSE > 9 9 NA 5 FALSE FALSE i i FALSE > 10 10 10.1 5 NA FALSE j j FALSE > {code} > Arrow doesn't do this, however because we (currently) only add columns when > the expression is named. > {code} > > Table$create(example_data) %>% > + group_by(int < 4) %>% collect() > Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32 > dbl: double > dbl2: double > lgl: bool > false: bool > chr: string > fct: dictionary<values=string, indices=int8, ordered=0> > {code} > This isn't a big deal right now since grouped aggregations aren't (quite) > here yet, but once we start having support for that, we will have people > using examples like this. -- This message was sent by Atlassian Jira (v8.3.4#803005)