[ https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15853923#comment-15853923 ]
ASF GitHub Bot commented on FLINK-5659: --------------------------------------- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/3219#discussion_r99576850 --- Diff: flink-core/src/main/java/org/apache/flink/util/FileUtils.java --- @@ -148,14 +158,49 @@ public static void deleteDirectory(File directory) throws IOException { return; } - // delete the directory. this fails if the directory is not empty, meaning - // if new files got concurrently created. we want to fail then. - try { - Files.delete(directory.toPath()); - } - catch (NoSuchFileException ignored) { - // if someone else deleted this concurrently, we don't mind - // the result is the same for us, after all + java.nio.file.Path directoryPath = directory.toPath(); + if (OperatingSystem.isWindows()) { + // delete the directory. this fails if the directory is not empty, meaning + // if new files got concurrently created. we want to fail then. + try { + Files.delete(directoryPath); + } catch (NoSuchFileException ignored) { + // if someone else deleted this concurrently, we don't mind + // the result is the same for us, after all + } catch (AccessDeniedException e) { + // This may occur on Windows if another process is currently + // deleting the file, since the file must be opened in order + // to delete it. We double check here to make sure the file + // was actually deleted by another process. Note that this + // isn't a perfect solution, but it's better than nothing. + if (Files.exists(directoryPath)) { + throw e; + } + } catch (DirectoryNotEmptyException e) { + // This may occur on Windows for some reason even for empty + // directories. Apparently there's a timing/visibility + // issue when concurrently deleting the contents of a directory + // and afterwards deleting the directory itself. + try { + Thread.sleep(50); --- End diff -- It only happens when multiple threads are involved; running the test with 1 thread works like charm. This whole thing is just strange. I've made some debugging and what happened is that multiple threads delete the same file, and verify the deletion using `Files.exists(filename)`. All of them pass. A few seconds later we hit the `DirectoryNotEmptyException`, check the contents again and what do you know, the file *which we just verified to be deleted* still exists. > FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows > -------------------------------------------------------------- > > Key: FLINK-5659 > URL: https://issues.apache.org/jira/browse/FLINK-5659 > Project: Flink > Issue Type: Bug > Components: Core, Local Runtime > Affects Versions: 1.2.0, 1.3.0 > Reporter: Chesnay Schepler > Assignee: Chesnay Schepler > Priority: Trivial > > The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on > Windows. > First you will run into AccessDeniedExceptions since one thread tried to > delete a file while another thread was already doing that, for which the file > has to be opened. > Once you resolve those exceptions (by catching them double checking whether > the file still exists), you run into DirectoryNotEmptyExceptions since there > is some wacky timing/visibility issue when deleting files concurrently. -- This message was sent by Atlassian JIRA (v6.3.15#6346)