[ https://issues.apache.org/jira/browse/ARROW-13860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17409102#comment-17409102 ]
Ian Cook edited comment on ARROW-13860 at 9/2/21, 8:12 PM: ----------------------------------------------------------- Thanks for the report! I dug into this and observed that it is happening because {code:java} write_parquet(x, ...) {code} calls {code:java} x <- Table$create(){code} which changes {{x}} into an {{arrow_dplyr_query}} because {{x}} has groups. Then it calls {code:java} is_writable_table(x){code} which triggers an error because {{x}} does not inherit {{data.frame}} or {{ArrowTabular}}. In version 4.0.1 of the arrow package, this did not trigger an error because the {{is_writable_table\(x\)}} function did not exist. It was introduced in #10387: [https://github.com/apache/arrow/commit/2e3a25e5f1329929e0fdb88ecc76bf404a5ccf57#diff-f6235d4767fc4a7ee1bb726d816b9742ef0bc07503dceb678fd3bc55ee15454b] But I am confused: Before ARROW-11769, I thought groups were lost when a grouped R data.frame was converted to a {{Table}}. So how is it that in the example above, the groups were seemingly written to the Parquet file and read back in? Didn't we always call {{Table$create()}} on the input to {{write_parquet()}} so shouldn't the groups have been lost? cc [~jonkeane] [~thisisnic] was (Author: icook): Thanks for the report! I dug into this and observed that it is happening because {code:java} write_parquet(x, ...) {code} calls {code:java} x <- Table$create(){code} which changes {{x}} into an {{arrow_dplyr_query}} because {{x}} has groups. Then it calls {code:java} is_writable_table(x){code} which triggers an error because {{x}} does not inherit {{data.frame}} or {{ArrowTabular}}. In version 4.0.0 of the arrow package, this did not trigger an error because the {{is_writable_table\(x\)}} function did not exist. It was introduced in #10387: [https://github.com/apache/arrow/commit/2e3a25e5f1329929e0fdb88ecc76bf404a5ccf57#diff-f6235d4767fc4a7ee1bb726d816b9742ef0bc07503dceb678fd3bc55ee15454b] But I am confused: Before ARROW-11769, I thought groups were lost when a grouped R data.frame was converted to a {{Table}}. So how is it that in the example above, the groups were seemingly written to the Parquet file and read back in? Didn't we always call {{Table$create()}} on the input to {{write_parquet()}} so shouldn't the groups have been lost? cc [~jonkeane] [~thisisnic] > [R] arrow 5.0.0 write_parquet throws error writing grouped data.frame > --------------------------------------------------------------------- > > Key: ARROW-13860 > URL: https://issues.apache.org/jira/browse/ARROW-13860 > Project: Apache Arrow > Issue Type: Bug > Components: R > Environment: maxOS 11.1 Big Sur > Reporter: Hideaki Hayashi > Priority: Major > > arrow 5.0.0 write_parquet throws error writing grouped data.frame. > Here is how to reproduce it. > {{library(dplyr)}} > {{ arrow::write_parquet(mtcars %>% group_by(am),"/tmp/mtcars_test.parquet")}} > {{# Error: x must be an object of class 'data.frame', 'RecordBatch', or > 'Table', not 'arrow_dplyr_query’.}} > > With arrow 4.0.1, this used to work fine. > {{library(dplyr)}} > {{arrow::write_parquet(mtcars %>% group_by(am),"/tmp/mtcars_test.parquet")}} > {{x <- arrow::read_parquet("/tmp/mtcars_test.parquet")}} > {{x}} > {{# A tibble: 32 x 11}} > {{# Groups: am [2]}} > {{# mpg cyl disp hp drat wt qsec vs am gear carb}} > {{# * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>}} > {{# 1 21 6 160 110 3.9 2.62 16.5 0 1 4 4}} > {{# 2 21 6 160 110 3.9 2.88 17.0 0 1 4 4}} > {{# 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1}} > {{# 4 21.4 6 258 110 3.08 3.22 19.4 1 0 3 1}} > {{# 5 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2}} > {{# 6 18.1 6 225 105 2.76 3.46 20.2 1 0 3 1}} > {{# 7 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4}} > {{# …}} > > -- This message was sent by Atlassian Jira (v8.3.4#803005)