[ https://issues.apache.org/jira/browse/ARROW-14908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17492259#comment-17492259 ]
Weston Pace commented on ARROW-14908: ------------------------------------- Yes, we would have an I/O thread per scanner in that case. > [R] join on dataset crashes on Windows > -------------------------------------- > > Key: ARROW-14908 > URL: https://issues.apache.org/jira/browse/ARROW-14908 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 6.0.0 > Environment: R version 4.0.4 > Reporter: Fabio Machado > Assignee: Will Jones > Priority: Critical > Labels: pull-request-available > Fix For: 7.0.1, 8.0.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > {code:java} > library(tidyverse) > library(arrow) > car_info <- rownames_to_column(mtcars, "car_info") > cars_arrow_table <- arrow_table(car_info) > other_mtcars_data <- select(car_info, 1) %>% > mutate(main_color = sample( c("red", "blue", "white", "black"), size = n(), > replace = TRUE)) %>% > arrow::arrow_table() > temp <- tempdir() > par_temp <- paste0(temp, "\\parquet") > car_info %>% arrow::write_dataset(par_temp) > cars_arrow <- arrow::open_dataset(par_temp) > # using arrow tables works > ------------------------------------------------------ > cars_arrow_table %>% left_join(other_mtcars_data) %>% count(main_color) %>% > collect() > # using open dataset crashes R > ------------------------------------------------------------------ > other_mtcars_data %>% > left_join(cars_arrow) %>% > count(main_color) %>% > collect() > #other variation also crash > cars_arrow %>% > left_join(other_mtcars_data) %>% > count(main_color) %>% > collect() > cars_arrow %>% > left_join(other_mtcars_data) %>% > group_by(main_color) %>% > summarise(n = n()) %>% > collect() > #compute also crashes > cars_arrow %>% > left_join(other_mtcars_data) %>% > count(main_color) %>% > compute() > # workaround with duckdb > ------------------------------------------------------ > ##this works > cars_duck <- to_duckdb(cars_arrow, auto_disconnect = TRUE) > other_cars_duck <- to_duckdb(other_mtcars_data, auto_disconnect = TRUE) > > cars_duck %>% > left_join(other_cars_duck) %>% > count(main_color) %>% > collect() > ##this doesn't (don't know if expected to work actually) > cars_arrow %>% > left_join(other_mtcars_data) %>% > to_duckdb() {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)