[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
[ https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050158#comment-16050158 ] ASF GitHub Bot commented on FLINK-5659: --- Github user zentol closed the pull request at: https://github.com/apache/flink/pull/3219 > FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows > -- > > Key: FLINK-5659 > URL: https://issues.apache.org/jira/browse/FLINK-5659 > Project: Flink > Issue Type: Bug > Components: Core, Local Runtime >Affects Versions: 1.2.0, 1.3.0, 1.4.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Trivial > > The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on > Windows. > First you will run into AccessDeniedExceptions since one thread tried to > delete a file while another thread was already doing that, for which the file > has to be opened. > Once you resolve those exceptions (by catching them double checking whether > the file still exists), you run into DirectoryNotEmptyExceptions since there > is some wacky timing/visibility issue when deleting files concurrently. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
[ https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853923#comment-15853923 ] ASF GitHub Bot commented on FLINK-5659: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/3219#discussion_r99576850 --- Diff: flink-core/src/main/java/org/apache/flink/util/FileUtils.java --- @@ -148,14 +158,49 @@ public static void deleteDirectory(File directory) throws IOException { return; } - // delete the directory. this fails if the directory is not empty, meaning - // if new files got concurrently created. we want to fail then. - try { - Files.delete(directory.toPath()); - } - catch (NoSuchFileException ignored) { - // if someone else deleted this concurrently, we don't mind - // the result is the same for us, after all + java.nio.file.Path directoryPath = directory.toPath(); + if (OperatingSystem.isWindows()) { + // delete the directory. this fails if the directory is not empty, meaning + // if new files got concurrently created. we want to fail then. + try { + Files.delete(directoryPath); + } catch (NoSuchFileException ignored) { + // if someone else deleted this concurrently, we don't mind + // the result is the same for us, after all + } catch (AccessDeniedException e) { + // This may occur on Windows if another process is currently + // deleting the file, since the file must be opened in order + // to delete it. We double check here to make sure the file + // was actually deleted by another process. Note that this + // isn't a perfect solution, but it's better than nothing. + if (Files.exists(directoryPath)) { + throw e; + } + } catch (DirectoryNotEmptyException e) { + // This may occur on Windows for some reason even for empty + // directories. Apparently there's a timing/visibility + // issue when concurrently deleting the contents of a directory + // and afterwards deleting the directory itself. + try { + Thread.sleep(50); --- End diff -- It only happens when multiple threads are involved; running the test with 1 thread works like charm. This whole thing is just strange. I've made some debugging and what happened is that multiple threads delete the same file, and verify the deletion using `Files.exists(filename)`. All of them pass. A few seconds later we hit the `DirectoryNotEmptyException`, check the contents again and what do you know, the file *which we just verified to be deleted* still exists. > FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows > -- > > Key: FLINK-5659 > URL: https://issues.apache.org/jira/browse/FLINK-5659 > Project: Flink > Issue Type: Bug > Components: Core, Local Runtime >Affects Versions: 1.2.0, 1.3.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Trivial > > The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on > Windows. > First you will run into AccessDeniedExceptions since one thread tried to > delete a file while another thread was already doing that, for which the file > has to be opened. > Once you resolve those exceptions (by catching them double checking whether > the file still exists), you run into DirectoryNotEmptyExceptions since there > is some wacky timing/visibility issue when deleting files concurrently. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
[ https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853873#comment-15853873 ] ASF GitHub Bot commented on FLINK-5659: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/3219#discussion_r99568502 --- Diff: flink-core/src/main/java/org/apache/flink/util/FileUtils.java --- @@ -116,6 +118,14 @@ else if (file.exists()) { } catch (NoSuchFileException e) { // if the file is already gone (concurrently), we don't mind + } catch (AccessDeniedException e) { --- End diff -- Is it ok to special case this inside the catch block? (I.e for windows check again, otherwise rethrow? > FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows > -- > > Key: FLINK-5659 > URL: https://issues.apache.org/jira/browse/FLINK-5659 > Project: Flink > Issue Type: Bug > Components: Core, Local Runtime >Affects Versions: 1.2.0, 1.3.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Trivial > > The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on > Windows. > First you will run into AccessDeniedExceptions since one thread tried to > delete a file while another thread was already doing that, for which the file > has to be opened. > Once you resolve those exceptions (by catching them double checking whether > the file still exists), you run into DirectoryNotEmptyExceptions since there > is some wacky timing/visibility issue when deleting files concurrently. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
[ https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839964#comment-15839964 ] ASF GitHub Bot commented on FLINK-5659: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3219 I think the semantics for Unix currently work well, I'd like to avoid the sleep there... > FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows > -- > > Key: FLINK-5659 > URL: https://issues.apache.org/jira/browse/FLINK-5659 > Project: Flink > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Trivial > > The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on > Windows. > First you will run into AccessDeniedExceptions since one thread tried to > delete a file while another thread was already doing that, for which the file > has to be opened. > Once you resolve those exceptions (by catching them double checking whether > the file still exists), you run into DirectoryNotEmptyExceptions since there > is some wacky timing/visibility issue when deleting files concurrently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
[ https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839893#comment-15839893 ] ASF GitHub Bot commented on FLINK-5659: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/3219 That could work. Are you worried about the added sleep in case of an `DirectoryNotEmptyException`? > FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows > -- > > Key: FLINK-5659 > URL: https://issues.apache.org/jira/browse/FLINK-5659 > Project: Flink > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Trivial > > The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on > Windows. > First you will run into AccessDeniedExceptions since one thread tried to > delete a file while another thread was already doing that, for which the file > has to be opened. > Once you resolve those exceptions (by catching them double checking whether > the file still exists), you run into DirectoryNotEmptyExceptions since there > is some wacky timing/visibility issue when deleting files concurrently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
[ https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839839#comment-15839839 ] ASF GitHub Bot commented on FLINK-5659: --- Github user StephanEwen commented on the issue: https://github.com/apache/flink/pull/3219 Uff, okay, I see the problem. Windows FileSystem semantics are not as easy to handle as Unix. How about we special case the method for Windows? Use `OperatingSystem.isWindows()` and in that case, do it different (like you suggested), but keep the Unix code path as it is? > FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows > -- > > Key: FLINK-5659 > URL: https://issues.apache.org/jira/browse/FLINK-5659 > Project: Flink > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Trivial > > The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on > Windows. > First you will run into AccessDeniedExceptions since one thread tried to > delete a file while another thread was already doing that, for which the file > has to be opened. > Once you resolve those exceptions (by catching them double checking whether > the file still exists), you run into DirectoryNotEmptyExceptions since there > is some wacky timing/visibility issue when deleting files concurrently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
[ https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839808#comment-15839808 ] ASF GitHub Bot commented on FLINK-5659: --- GitHub user zentol opened a pull request: https://github.com/apache/flink/pull/3219 [FLINK-5659] Harden FileBaseUtils#deleteFileOrDirectory on WIndows This PR hardens `FileBaseUtils#deleteFileOrDirectory` when the method is called concurrently on Windows. `AccessDeniedException`s can be thrown when attempting to delete a file while a deletion is in progress. In order to delete a file it has to be opened, and as we all know you can't (easily) delete open files on Windows. For this case we catch the exception, double-check whether the file is deleted by now, and throw the exception if it isn't, assuming a lack of permission prevented the deletion. `DirectoryNotEmptyException`s can be thrown for directories for which a delete call for every file has succeeded, since the OS may be slow in updating the actual FileSystem. For this case we wait for a bit (50 milliseconds, yeah it's that slow) and check whether the directory is empty by now. If it isn't we throw the exception assuming a concurrent creation of a new file in the given directory. Note that neither changes are perfect; they are not guaranteed to prevent the issue. However, it no longer categorically fails for me , but only once every few hundred runs. You can merge this pull request into a Git repository by running: $ git pull https://github.com/zentol/flink 5659_fileutils_win Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/3219.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3219 commit f372899c7cd8f5102421b986be1f1b87e32cf15f Author: zentolDate: 2017-01-26T14:47:11Z [FLINK-5659] Harden FileBaseUtils#deleteFileOrDirectory on WIndows > FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows > -- > > Key: FLINK-5659 > URL: https://issues.apache.org/jira/browse/FLINK-5659 > Project: Flink > Issue Type: Bug >Affects Versions: 1.2.0, 1.3.0 >Reporter: Chesnay Schepler >Assignee: Chesnay Schepler >Priority: Trivial > > The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on > Windows. > First you will run into AccessDeniedExceptions since one thread tried to > delete a file while another thread was already doing that, for which the file > has to be opened. > Once you resolve those exceptions (by catching them double checking whether > the file still exists), you run into DirectoryNotEmptyExceptions since there > is some wacky timing/visibility issue when deleting files concurrently. -- This message was sent by Atlassian JIRA (v6.3.4#6332)