[ https://issues.apache.org/jira/browse/ARROW-15627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Will Jones updated ARROW-15627: ------------------------------- Priority: Minor (was: Major) > [R] Support unify_schemas for union datasets > -------------------------------------------- > > Key: ARROW-15627 > URL: https://issues.apache.org/jira/browse/ARROW-15627 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Affects Versions: 7.0.0 > Reporter: Will Jones > Priority: Minor > Labels: dataset > Fix For: 8.0.0 > > > Also out of discussion on [https://github.com/apache/arrow/issues/12371] > You can unify schemas between different parquet files, but it seems like you > can't union together two (or more) datasets that have different schemas. This > is odd, because we do compute the unified schema onĀ [this > line|https://github.com/apache/arrow/blob/ba0814e60a451525dd5492b68059aad8a4bdaf4f/r/R/dataset.R#L189], > only to later assert all the schemas are the same. > {code:R} > library(arrow) > library(dplyr) > df1 <- arrow_table(x = array(c(1, 2, 3)), > y = array(c("a", "b", "c"))) > df2 <- arrow_table(x = array(c(4, 5)), > z = array(c("d", "e"))) > df1 %>% write_dataset("example1", format="parquet") > df2 %>% write_dataset("example2", format="parquet") > ds1 <- open_dataset("example1", format="parquet") > ds2 <- open_dataset("example2", format="parquet") > # These don't work > ds <- c(ds1, ds2) # c() actually does the same thing > ds <- open_dataset(list(ds1, ds2)) # This fails due to mismatch in schema > ds <- open_dataset(c("example1", "example2"), format="parquet", unify_schemas > = TRUE) > # This does > ds <- open_dataset(c("example2/part-0.parquet", "example1/part-0.parquet"), > format="parquet", unify_schemas = TRUE) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)