[jira] [Commented] (ARROW-17738) [R] dplyr::compute should convert from grouped arrow_dplyr_query to arrow Table

2022-09-17 Thread SHIMA Tatsuya (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606158#comment-17606158
 ] 

SHIMA Tatsuya commented on ARROW-17738:
---

I think it is confusing to users when compute does not result in a Table as 
intended when the group is left after summarise, etc. is executed.

{code:r}
mtcars |> arrow::arrow_table() |> dplyr::group_by(vs, am) |> 
dplyr::summarise(wt = mean(wt)) |> dplyr::compute()
#> Table (query)
#> vs: double
#> am: double
#> wt: double
#>
#> * Grouped by vs
#> See $.data for the source Arrow object
{code}

> [R] dplyr::compute should convert from grouped arrow_dplyr_query to arrow 
> Table
> ---
>
> Key: ARROW-17738
> URL: https://issues.apache.org/jira/browse/ARROW-17738
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 9.0.0
>Reporter: SHIMA Tatsuya
>Assignee: SHIMA Tatsuya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It is expected that {{dplyr::compute()}} will perform the calculation on the 
> arrow dplyr query and convert it to a Table, but it does not seem to work 
> correctly for grouped arrow dplyr queries and does not result in a Table.
> {code:r}
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::compute() |> 
> class()
> #> [1] "arrow_dplyr_query"
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::ungroup() |> 
> dplyr::compute() |> class()
> #> [1] "Table""ArrowTabular" "ArrowObject"  "R6"
> {code}
> {{as_arrow_table()}} works fine.
> {code:r}
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> class()
> #> [1] "arrow_dplyr_query"
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::compute() |> 
> class()
> #> [1] "arrow_dplyr_query"
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> 
> dplyr::collect(FALSE) |> class()
> #> [1] "arrow_dplyr_query"
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> 
> arrow::as_arrow_table() |> class()
> #> [1] "Table""ArrowTabular" "ArrowObject"  "R6"
> {code}
> It seems to revert to arrow dplyr query in the following line.
> [https://github.com/apache/arrow/blob/7cfdfbb0d5472f8f8893398b51042a3ca1dd0adf/r/R/dplyr-collect.R#L73-L75]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (ARROW-17738) [R] dplyr::compute should convert from grouped arrow_dplyr_query to arrow Table

2022-09-17 Thread Neal Richardson (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606157#comment-17606157
 ] 

Neal Richardson commented on ARROW-17738:
-

They are evaluated and converted to Tables, but then if there are groups, 
group_by is called on the Table, which results in an arrow_dplyr_query object 
containing the Table. So, yes, this was intentional. Do you have a use case 
where this is a problem?

> [R] dplyr::compute should convert from grouped arrow_dplyr_query to arrow 
> Table
> ---
>
> Key: ARROW-17738
> URL: https://issues.apache.org/jira/browse/ARROW-17738
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: R
>Affects Versions: 9.0.0
>Reporter: SHIMA Tatsuya
>Assignee: SHIMA Tatsuya
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> It is expected that {{dplyr::compute()}} will perform the calculation on the 
> arrow dplyr query and convert it to a Table, but it does not seem to work 
> correctly for grouped arrow dplyr queries and does not result in a Table.
> {code:r}
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::compute() |> 
> class()
> #> [1] "arrow_dplyr_query"
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::ungroup() |> 
> dplyr::compute() |> class()
> #> [1] "Table""ArrowTabular" "ArrowObject"  "R6"
> {code}
> {{as_arrow_table()}} works fine.
> {code:r}
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> class()
> #> [1] "arrow_dplyr_query"
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::compute() |> 
> class()
> #> [1] "arrow_dplyr_query"
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> 
> dplyr::collect(FALSE) |> class()
> #> [1] "arrow_dplyr_query"
> mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> 
> arrow::as_arrow_table() |> class()
> #> [1] "Table""ArrowTabular" "ArrowObject"  "R6"
> {code}
> It seems to revert to arrow dplyr query in the following line.
> [https://github.com/apache/arrow/blob/7cfdfbb0d5472f8f8893398b51042a3ca1dd0adf/r/R/dplyr-collect.R#L73-L75]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)