[ https://issues.apache.org/jira/browse/ARROW-17444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17580712#comment-17580712 ]
Jacob Wujciak-Jens commented on ARROW-17444: -------------------------------------------- I can reproduce it. The issue seems to be caused specifically by collect. If you don't collect the error does not happen. > [R] Windows Only: Cannot delete file previously accesed with open_dataset > ------------------------------------------------------------------------- > > Key: ARROW-17444 > URL: https://issues.apache.org/jira/browse/ARROW-17444 > Project: Apache Arrow > Issue Type: Bug > Components: R > Affects Versions: 8.0.0, 9.0.0, 8.0.1 > Environment: Windows 10 > R 4.2.1 > RStudio 22.07.1 > Arrow 9.0 (fails on arrow 8 as well) > Reporter: Riaz Arbi > Priority: Major > > Hello, > I encountered this issue because it breaks my tests when I run > {code:java} > rhub::check_for_cran(){code} > Because of this, I know it only affects Windows, all other OS checks pass. > > If you write files to a directory using arrow's > {code:java} > write_*{code} > functions, and then > {code:java} > collect(open_dataset(directory)){code} > > you cannot delete a file in the directory, you get an error. This is best > demonstrated in a reprex: > > {code:java} > # setup > ------------------------------------------------------------------------ > local_prefix <- tempfile() > df <- data.frame(a = 1:5, b = letters[1:5]) > # works fine > ------------------------------------------------------------------- > fs <- LocalFileSystem$create() > fs$CreateDir(local_prefix) > fsdir <- fs$cd(local_prefix) > write_parquet(df, fsdir$path("1.parquet")) > #open_dataset(local_prefix) %>% collect() > fsdir$DeleteFile("1.parquet") > unlink(local_prefix, recursive = TRUE) > # doesn't work > ----------------------------------------------------------------- > fs <- LocalFileSystem$create() > fs$CreateDir(local_prefix) > fsdir <- fs$cd(local_prefix) > write_parquet(df, fsdir$path("1.parquet")) > open_dataset(local_prefix) %>% collect() # <-- ERROR IS CAUSED BY THIS > fsdir$DeleteFile("1.parquet") # <-- HERE IS WHERE YOU GET AN ERROR > unlink(local_prefix, recursive = TRUE) > > > {code} > > Here is the error I keep getting: > > {code:java} > Error: IOError: Cannot delete file > 'C:/Users/riaz/AppData/Local/Temp/Rtmp8qUlcx/file233c22f923d0/1.parquet'. > Detail: [Windows error 32] The process cannot access the file because it is > being used by another process. > {code} > > Note that > * I do not create an object from the `open_dataset` function. I simply call > it. > * I also call `collect` in order to pull the data. So I cannot see why the > connection to the file should exist after collect is called > * as mentioned above, all other OSes don't exhibit this behaviour. > * my environment pane looks identical in both instances. > * I do not need to restart R to delete the file. I can simply clear all > objects from the workspace (rm(list = ls()) and then it works fine. -- This message was sent by Atlassian Jira (v8.20.10#820010)