[ https://issues.apache.org/jira/browse/ARROW-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285072#comment-17285072 ]
Claymore Marshall commented on ARROW-11579: ------------------------------------------- Noting that I was able to use set_cpu_count(2) without hanging, but have now noticed on bulk small feather data set reads, getting hanging behaviour again. Need to set to set_cpu_count(1) to avoid this. > [R] read_feather hanging on Windows > ----------------------------------- > > Key: ARROW-11579 > URL: https://issues.apache.org/jira/browse/ARROW-11579 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 3.0.0 > Environment: windows 10, R 4.0.0, arrow 3.0.0 > Reporter: Claymore Marshall > Assignee: Ian Cook > Priority: Major > > On windows 10, reading large feather objects in R seems to lead to hanging on > a repeat read. > > This issue has been reproduced on 3 different windows machines. All running > win 10, R 4.0.0 (or later). > read_feather does not hang if using version = 1, or using uncompressed with > version 2. > This issue does not happen on tests on linux (Ubuntu 20.04 atleast) > > Example: > > library(arrow) > m <- data.frame(x = rnorm(7e6), y = rnorm(5), b = rnorm(5), n = rnorm(5), c = > c("a", "n")) > write_feather(m, "test.feather4", version = 2, compression = "lz4") # does > not hang with uncompressed, but does with lz4 and zstd > for (j in 1:50){ > y <- read_feather("test.feather4") # hangs after an unpredictable number of > reads, just on windows though > print(paste0("feather read ", j, "...")) > } > > > > > > Interestingly, a work around is to use read_feather but call just one column > at a time. This does not hang so far. > > e.g. y returns the full data frame, and this doesn't hang on repeated reads: > > _y <- lapply(cols, function(col) {_ > _read_feather("test.feather4", col_select = all_of(col))_ > _})_ > -- This message was sent by Atlassian Jira (v8.3.4#803005)