Nicola Crane created ARROW-14190: ------------------------------------ Summary: [R] Should unify_schemas() allow change of type? Key: ARROW-14190 URL: https://issues.apache.org/jira/browse/ARROW-14190 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Nicola Crane
Should {{unify_schemas()}} be able to do schema evolution? If schemas with different (but compatible) types are combined using {{open_dataset()}}, this works, whereas if done via {{unify_schemas()}}, it results in an error. See discussion here: https://github.com/apache/arrow-cookbook/pull/67#discussion_r714847220 {code:r} library(dplyr) library(arrow) # Set up schemas schema1 = schema(speed = int32(), dist = int32()) schema2 = schema(speed = float64(), dist = float64()) # Try to combine schemas via `unify_schemas()` - results in an error unify_schemas(schema1, schema2) ## Error: Invalid: Unable to merge: Field speed has incompatible types: int32 vs double ## /home/nic2/arrow/cpp/src/arrow/type.cc:1609 fields_[i]->MergeWith(field) ## /home/nic2/arrow/cpp/src/arrow/type.cc:1672 AddField(field) ## /home/nic2/arrow/cpp/src/arrow/type.cc:1743 builder.AddSchema(schema) # Create datasets with different schemas and read in via `open_dataset()` cars1 <- Table$create(slice(cars, 1:25), schema = schema1) cars2 <- Table$create(slice(cars, 26:50), schema = schema2) td <- tempfile() dir.create(td) write_parquet(cars1, paste0(td, "/cars1.parquet")) write_parquet(cars2, paste0(td, "/cars2.parquet")) new_dataset <- open_dataset(td) new_dataset$schema # Schema # speed: int32 # dist: int32 # # See $metadata for additional Schema metadata {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)