[ https://issues.apache.org/jira/browse/ARROW-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423277#comment-17423277 ]
Neal Richardson commented on ARROW-14190: ----------------------------------------- open_dataset isn't (by default) trying to unify schemas, it just takes the first one it finds (which is why you see int32 as the types, I'd expect if you unified those schemas that you'd promote to float64). You could pass unify_schemas = TRUE to it and would probably get the error. > [R] Should unify_schemas() allow change of type? > ------------------------------------------------ > > Key: ARROW-14190 > URL: https://issues.apache.org/jira/browse/ARROW-14190 > Project: Apache Arrow > Issue Type: Improvement > Components: R > Reporter: Nicola Crane > Priority: Major > > Should {{unify_schemas()}} be able to do schema evolution? If schemas with > different (but compatible) types are combined using {{open_dataset()}}, this > works, whereas if done via {{unify_schemas()}}, it results in an error. > See discussion here: > https://github.com/apache/arrow-cookbook/pull/67#discussion_r714847220 > {code:r} > library(dplyr) > library(arrow) > # Set up schemas > schema1 = schema(speed = int32(), dist = int32()) > schema2 = schema(speed = float64(), dist = float64()) > # Try to combine schemas via `unify_schemas()` - results in an error > unify_schemas(schema1, schema2) > ## Error: Invalid: Unable to merge: Field speed has incompatible types: int32 > vs double > ## /home/nic2/arrow/cpp/src/arrow/type.cc:1609 fields_[i]->MergeWith(field) > ## /home/nic2/arrow/cpp/src/arrow/type.cc:1672 AddField(field) > ## /home/nic2/arrow/cpp/src/arrow/type.cc:1743 builder.AddSchema(schema) > # Create datasets with different schemas and read in via `open_dataset()` > cars1 <- Table$create(slice(cars, 1:25), schema = schema1) > cars2 <- Table$create(slice(cars, 26:50), schema = schema2) > td <- tempfile() > dir.create(td) > write_parquet(cars1, paste0(td, "/cars1.parquet")) > write_parquet(cars2, paste0(td, "/cars2.parquet")) > new_dataset <- open_dataset(td) > new_dataset$schema > # Schema > # speed: int32 > # dist: int32 > # > # See $metadata for additional Schema metadata > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)