[jira] [Commented] (ARROW-17738) [R] dplyr::compute should convert from grouped arrow_dplyr_query to arrow Table
[ https://issues.apache.org/jira/browse/ARROW-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606158#comment-17606158 ] SHIMA Tatsuya commented on ARROW-17738: --- I think it is confusing to users when compute does not result in a Table as intended when the group is left after summarise, etc. is executed. {code:r} mtcars |> arrow::arrow_table() |> dplyr::group_by(vs, am) |> dplyr::summarise(wt = mean(wt)) |> dplyr::compute() #> Table (query) #> vs: double #> am: double #> wt: double #> #> * Grouped by vs #> See $.data for the source Arrow object {code} > [R] dplyr::compute should convert from grouped arrow_dplyr_query to arrow > Table > --- > > Key: ARROW-17738 > URL: https://issues.apache.org/jira/browse/ARROW-17738 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 9.0.0 >Reporter: SHIMA Tatsuya >Assignee: SHIMA Tatsuya >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > It is expected that {{dplyr::compute()}} will perform the calculation on the > arrow dplyr query and convert it to a Table, but it does not seem to work > correctly for grouped arrow dplyr queries and does not result in a Table. > {code:r} > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::compute() |> > class() > #> [1] "arrow_dplyr_query" > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::ungroup() |> > dplyr::compute() |> class() > #> [1] "Table""ArrowTabular" "ArrowObject" "R6" > {code} > {{as_arrow_table()}} works fine. > {code:r} > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> class() > #> [1] "arrow_dplyr_query" > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::compute() |> > class() > #> [1] "arrow_dplyr_query" > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> > dplyr::collect(FALSE) |> class() > #> [1] "arrow_dplyr_query" > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> > arrow::as_arrow_table() |> class() > #> [1] "Table""ArrowTabular" "ArrowObject" "R6" > {code} > It seems to revert to arrow dplyr query in the following line. > [https://github.com/apache/arrow/blob/7cfdfbb0d5472f8f8893398b51042a3ca1dd0adf/r/R/dplyr-collect.R#L73-L75] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (ARROW-17738) [R] dplyr::compute should convert from grouped arrow_dplyr_query to arrow Table
[ https://issues.apache.org/jira/browse/ARROW-17738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606157#comment-17606157 ] Neal Richardson commented on ARROW-17738: - They are evaluated and converted to Tables, but then if there are groups, group_by is called on the Table, which results in an arrow_dplyr_query object containing the Table. So, yes, this was intentional. Do you have a use case where this is a problem? > [R] dplyr::compute should convert from grouped arrow_dplyr_query to arrow > Table > --- > > Key: ARROW-17738 > URL: https://issues.apache.org/jira/browse/ARROW-17738 > Project: Apache Arrow > Issue Type: Bug > Components: R >Affects Versions: 9.0.0 >Reporter: SHIMA Tatsuya >Assignee: SHIMA Tatsuya >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > It is expected that {{dplyr::compute()}} will perform the calculation on the > arrow dplyr query and convert it to a Table, but it does not seem to work > correctly for grouped arrow dplyr queries and does not result in a Table. > {code:r} > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::compute() |> > class() > #> [1] "arrow_dplyr_query" > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::ungroup() |> > dplyr::compute() |> class() > #> [1] "Table""ArrowTabular" "ArrowObject" "R6" > {code} > {{as_arrow_table()}} works fine. > {code:r} > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> class() > #> [1] "arrow_dplyr_query" > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> dplyr::compute() |> > class() > #> [1] "arrow_dplyr_query" > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> > dplyr::collect(FALSE) |> class() > #> [1] "arrow_dplyr_query" > mtcars |> arrow::arrow_table() |> dplyr::group_by(cyl) |> > arrow::as_arrow_table() |> class() > #> [1] "Table""ArrowTabular" "ArrowObject" "R6" > {code} > It seems to revert to arrow dplyr query in the following line. > [https://github.com/apache/arrow/blob/7cfdfbb0d5472f8f8893398b51042a3ca1dd0adf/r/R/dplyr-collect.R#L73-L75] > -- This message was sent by Atlassian Jira (v8.20.10#820010)