Nicola Crane created ARROW-17784: ------------------------------------ Summary: [C++] Opening a dataset where partitioning variable is in the dataset should error differently Key: ARROW-17784 URL: https://issues.apache.org/jira/browse/ARROW-17784 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Nicola Crane
The error message given when the name of the partition given matches a field in the dataset is a bit misleading - can we catch this earlier and give a different error message? {code:r} /library(dplyr) library(arrow) tf <- tempfile() dir.create(tf) write_dataset(mtcars, tf, partitioning = "cyl", hive_style = FALSE) # The schema fed into `partitioning` should refer to `cyl` and not `wt`, but the error message doesn't refer to the duplication here open_dataset(tf, partitioning = schema(wt = int64())) %>% collect() #> Error in `open_dataset()`: #> ! Invalid: Unable to merge: Field wt has incompatible types: double vs int64 #> /home/nic2/arrow/cpp/src/arrow/type.cc:1692 fields_[i]->MergeWith(field) #> /home/nic2/arrow/cpp/src/arrow/type.cc:1755 AddField(field) #> /home/nic2/arrow/cpp/src/arrow/type.cc:1826 builder.AddSchema(schema) #> /home/nic2/arrow/cpp/src/arrow/dataset/discovery.cc:262 Inspect(options.inspect_options) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)