[ 
https://issues.apache.org/jira/browse/ARROW-13434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17385520#comment-17385520
 ] 

Jonathan Keane commented on ARROW-13434:
----------------------------------------

Oh dear, that is a hole! I'll rephrase the title of the issue to match that

> [R] group_by() with an expression
> ---------------------------------
>
>                 Key: ARROW-13434
>                 URL: https://issues.apache.org/jira/browse/ARROW-13434
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Jonathan Keane
>            Priority: Major
>
> With dplyr, when we group_by with an expression, a column is added to the 
> dataframe that has the result of the expression.
> {code}
> > example_data %>% 
> +   group_by(int < 4) %>% collect()
> # A tibble: 10 x 8
> # Groups:   int < 4 [3]
>      int   dbl  dbl2 lgl   false chr   fct   `int < 4`
>    <int> <dbl> <dbl> <lgl> <lgl> <chr> <fct> <lgl>    
>  1     1   1.1     5 TRUE  FALSE a     a     TRUE     
>  2     2   2.1     5 NA    FALSE b     b     TRUE     
>  3     3   3.1     5 TRUE  FALSE c     c     TRUE     
>  4    NA   4.1     5 FALSE FALSE d     d     NA       
>  5     5   5.1     5 TRUE  FALSE e     NA    FALSE    
>  6     6   6.1     5 NA    FALSE NA    NA    FALSE    
>  7     7   7.1     5 NA    FALSE g     g     FALSE    
>  8     8   8.1     5 FALSE FALSE h     h     FALSE    
>  9     9  NA       5 FALSE FALSE i     i     FALSE    
> 10    10  10.1     5 NA    FALSE j     j     FALSE    
> {code}
> Arrow doesn't do this, however:
> {code}
> > Table$create(example_data) %>% 
> +   group_by(int < 4) %>% collect()
>  Error: Invalid: No match for FieldRef.Name(int < 4) in int: int32
> dbl: double
> dbl2: double
> lgl: bool
> false: bool
> chr: string
> fct: dictionary<values=string, indices=int8, ordered=0> 
> {code}
> This isn't a big deal right now since grouped aggregations aren't (quite) 
> here yet, but once we start having support for that, we will have people 
> using examples like this. This might actually be something we need/want to do 
> in C++ instead of in the R client.
> The workaround is relatively simple: add the expression in a mutate, then 
> group_by that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to