[jira] [Commented] (ARROW-14190) [R] Should unify_schemas() allow change of type?

Neal Richardson (Jira) Fri, 01 Oct 2021 06:04:05 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423277#comment-17423277
 ]


Neal Richardson commented on ARROW-14190:
-----------------------------------------

open_dataset isn't (by default) trying to unify schemas, it just takes the 
first one it finds (which is why you see int32 as the types, I'd expect if you 
unified those schemas that you'd promote to float64). You could pass 
unify_schemas = TRUE to it and would probably get the error. 

> [R] Should unify_schemas() allow change of type?
> ------------------------------------------------
>
>                 Key: ARROW-14190
>                 URL: https://issues.apache.org/jira/browse/ARROW-14190
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Nicola Crane
>            Priority: Major
>
> Should {{unify_schemas()}} be able to do schema evolution?  If schemas with 
> different (but compatible) types are combined using {{open_dataset()}}, this 
> works, whereas if done via {{unify_schemas()}}, it results in an error.
> See discussion here: 
> https://github.com/apache/arrow-cookbook/pull/67#discussion_r714847220
> {code:r}
> library(dplyr)
> library(arrow)
> # Set up schemas
> schema1 = schema(speed = int32(), dist = int32())
> schema2 = schema(speed = float64(), dist = float64())
> # Try to combine schemas via `unify_schemas()` - results in an error
> unify_schemas(schema1, schema2)
> ## Error: Invalid: Unable to merge: Field speed has incompatible types: int32 
> vs double
> ## /home/nic2/arrow/cpp/src/arrow/type.cc:1609  fields_[i]->MergeWith(field)
> ## /home/nic2/arrow/cpp/src/arrow/type.cc:1672  AddField(field)
> ## /home/nic2/arrow/cpp/src/arrow/type.cc:1743  builder.AddSchema(schema)
> # Create datasets with different schemas and read in via `open_dataset()`
> cars1 <- Table$create(slice(cars, 1:25), schema = schema1)
> cars2 <- Table$create(slice(cars, 26:50), schema = schema2)
> td <- tempfile()
> dir.create(td)
> write_parquet(cars1, paste0(td, "/cars1.parquet"))
> write_parquet(cars2, paste0(td, "/cars2.parquet"))
> new_dataset <- open_dataset(td) 
> new_dataset$schema
> # Schema
> # speed: int32
> # dist: int32
> # 
> # See $metadata for additional Schema metadata
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-14190) [R] Should unify_schemas() allow change of type?

Reply via email to