Sam Albers created ARROW-14324: ---------------------------------- Summary: Inconsistent application of type in Datasets via the schema Key: ARROW-14324 URL: https://issues.apache.org/jira/browse/ARROW-14324 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 5.0.0 Reporter: Sam Albers
It looks like at least {{filter}} is not handling a column type specified by {{schema }}when specified in {{open_dataset. }}Reprex: {code:java} options("max.print" = 5) library(arrow, warn.conflicts = FALSE) library(dplyr, warn.conflicts = FALSE) ## Set up the data tf <- tempfile() dir.create(tf) write_dataset(quakes, tf) ## Works as expected open_dataset(tf) %>% filter(stations == 41) %>% collect() #> lat long depth mag stations #> 1 -20.42 181.62 562 4.8 41 #> [ reached 'max' / getOption("max.print") -- omitted 11 rows ] ## errors as expected open_dataset(tf) %>% filter(stations == "41") %>% collect() #> Error: NotImplemented: Function equal has no kernel matching input types (array[int32], scalar[string]) ## Ok let's change a column type tf_reg <- open_dataset(tf)$schema tf_reg$stations <- string() ## ok returns a character open_dataset(tf, schema = tf_reg) %>% pull(stations) %>% typeof() #> [1] "character" ## So if `stations` is character I think this should work? open_dataset(tf, schema = tf_reg) %>% filter(stations == as.character("41")) %>% collect() #> Error: Filter expression not supported for Arrow Datasets: stations == as.character("41") #> Call collect() first to pull data into R. ## previous behaviour no longer works open_dataset(tf, schema = tf_reg) %>% filter(stations == 41) %>% collect() #> Error: NotImplemented: Function equal has no kernel matching input types (array[string], scalar[double]) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)