[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows

2017-06-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16050158#comment-16050158
 ] 

ASF GitHub Bot commented on FLINK-5659:
---

Github user zentol closed the pull request at:

https://github.com/apache/flink/pull/3219


> FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
> --
>
> Key: FLINK-5659
> URL: https://issues.apache.org/jira/browse/FLINK-5659
> Project: Flink
>  Issue Type: Bug
>  Components: Core, Local Runtime
>Affects Versions: 1.2.0, 1.3.0, 1.4.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on 
> Windows.
> First you will run into AccessDeniedExceptions since one thread tried to 
> delete a file while another thread was already doing that, for which the file 
> has to be opened.
> Once you resolve those exceptions (by catching them double checking whether 
> the file still exists), you run into DirectoryNotEmptyExceptions since there 
> is some wacky timing/visibility issue when deleting files concurrently.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows

2017-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853923#comment-15853923
 ] 

ASF GitHub Bot commented on FLINK-5659:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/3219#discussion_r99576850
  
--- Diff: flink-core/src/main/java/org/apache/flink/util/FileUtils.java ---
@@ -148,14 +158,49 @@ public static void deleteDirectory(File directory) 
throws IOException {
return;
}
 
-   // delete the directory. this fails if the directory is 
not empty, meaning
-   // if new files got concurrently created. we want to 
fail then.
-   try {
-   Files.delete(directory.toPath());
-   }
-   catch (NoSuchFileException ignored) {
-   // if someone else deleted this concurrently, 
we don't mind
-   // the result is the same for us, after all
+   java.nio.file.Path directoryPath = directory.toPath();
+   if (OperatingSystem.isWindows()) {
+   // delete the directory. this fails if the 
directory is not empty, meaning
+   // if new files got concurrently created. we 
want to fail then.
+   try {
+   Files.delete(directoryPath);
+   } catch (NoSuchFileException ignored) {
+   // if someone else deleted this 
concurrently, we don't mind
+   // the result is the same for us, after 
all
+   } catch (AccessDeniedException e) {
+   // This may occur on Windows if another 
process is currently
+   // deleting the file, since the file 
must be opened in order
+   // to delete it. We double check here 
to make sure the file
+   // was actually deleted by another 
process. Note that this
+   // isn't a perfect solution, but it's 
better than nothing.
+   if (Files.exists(directoryPath)) {
+   throw e;
+   }
+   } catch (DirectoryNotEmptyException e) {
+   // This may occur on Windows for some 
reason even for empty
+   // directories. Apparently there's a 
timing/visibility
+   // issue when concurrently deleting the 
contents of a directory 
+   // and afterwards deleting the 
directory itself.
+   try {
+   Thread.sleep(50);
--- End diff --

It only happens when multiple threads are involved; running the test with 1 
thread works like charm.

This whole thing is just strange. I've made some debugging and what 
happened is that multiple threads delete the same file, and verify the deletion 
using `Files.exists(filename)`. All of them pass. A few seconds later we hit 
the `DirectoryNotEmptyException`, check the contents again and what do you 
know, the file *which we just verified to be deleted*  still exists.


> FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
> --
>
> Key: FLINK-5659
> URL: https://issues.apache.org/jira/browse/FLINK-5659
> Project: Flink
>  Issue Type: Bug
>  Components: Core, Local Runtime
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on 
> Windows.
> First you will run into AccessDeniedExceptions since one thread tried to 
> delete a file while another thread was already doing that, for which the file 
> has to be opened.
> Once you resolve those exceptions (by catching them double checking whether 
> the file still exists), you run into DirectoryNotEmptyExceptions since there 
> is some wacky timing/visibility issue when deleting files concurrently.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows

2017-02-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15853873#comment-15853873
 ] 

ASF GitHub Bot commented on FLINK-5659:
---

Github user zentol commented on a diff in the pull request:

https://github.com/apache/flink/pull/3219#discussion_r99568502
  
--- Diff: flink-core/src/main/java/org/apache/flink/util/FileUtils.java ---
@@ -116,6 +118,14 @@ else if (file.exists()) {
}
catch (NoSuchFileException e) {
// if the file is already gone (concurrently), 
we don't mind
+   } catch (AccessDeniedException e) {
--- End diff --

Is it ok to special case this inside the catch block? (I.e for windows 
check again, otherwise rethrow?


> FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
> --
>
> Key: FLINK-5659
> URL: https://issues.apache.org/jira/browse/FLINK-5659
> Project: Flink
>  Issue Type: Bug
>  Components: Core, Local Runtime
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on 
> Windows.
> First you will run into AccessDeniedExceptions since one thread tried to 
> delete a file while another thread was already doing that, for which the file 
> has to be opened.
> Once you resolve those exceptions (by catching them double checking whether 
> the file still exists), you run into DirectoryNotEmptyExceptions since there 
> is some wacky timing/visibility issue when deleting files concurrently.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839964#comment-15839964
 ] 

ASF GitHub Bot commented on FLINK-5659:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3219
  
I think the semantics for Unix currently work well, I'd like to avoid the 
sleep there...


> FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
> --
>
> Key: FLINK-5659
> URL: https://issues.apache.org/jira/browse/FLINK-5659
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on 
> Windows.
> First you will run into AccessDeniedExceptions since one thread tried to 
> delete a file while another thread was already doing that, for which the file 
> has to be opened.
> Once you resolve those exceptions (by catching them double checking whether 
> the file still exists), you run into DirectoryNotEmptyExceptions since there 
> is some wacky timing/visibility issue when deleting files concurrently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839893#comment-15839893
 ] 

ASF GitHub Bot commented on FLINK-5659:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/3219
  
That could work. Are you worried about the added sleep in case of an 
`DirectoryNotEmptyException`?



> FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
> --
>
> Key: FLINK-5659
> URL: https://issues.apache.org/jira/browse/FLINK-5659
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on 
> Windows.
> First you will run into AccessDeniedExceptions since one thread tried to 
> delete a file while another thread was already doing that, for which the file 
> has to be opened.
> Once you resolve those exceptions (by catching them double checking whether 
> the file still exists), you run into DirectoryNotEmptyExceptions since there 
> is some wacky timing/visibility issue when deleting files concurrently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839839#comment-15839839
 ] 

ASF GitHub Bot commented on FLINK-5659:
---

Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/3219
  
Uff, okay, I see the problem. Windows FileSystem semantics are not as easy 
to handle as Unix.

How about we special case the method for Windows? Use 
`OperatingSystem.isWindows()` and in that case, do it different (like you 
suggested), but keep the Unix code path as it is? 


> FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
> --
>
> Key: FLINK-5659
> URL: https://issues.apache.org/jira/browse/FLINK-5659
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on 
> Windows.
> First you will run into AccessDeniedExceptions since one thread tried to 
> delete a file while another thread was already doing that, for which the file 
> has to be opened.
> Once you resolve those exceptions (by catching them double checking whether 
> the file still exists), you run into DirectoryNotEmptyExceptions since there 
> is some wacky timing/visibility issue when deleting files concurrently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-5659) FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows

2017-01-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-5659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15839808#comment-15839808
 ] 

ASF GitHub Bot commented on FLINK-5659:
---

GitHub user zentol opened a pull request:

https://github.com/apache/flink/pull/3219

[FLINK-5659] Harden FileBaseUtils#deleteFileOrDirectory on WIndows

This PR hardens `FileBaseUtils#deleteFileOrDirectory` when the method is 
called concurrently on Windows.

`AccessDeniedException`s can be thrown when attempting to delete a file 
while a deletion is in progress. In order to delete a file it has to be opened, 
and as we all know you can't (easily) delete open files on Windows.
For this case we catch the exception, double-check whether the file is 
deleted by now, and throw the exception if it isn't, assuming a lack of 
permission prevented the deletion.

`DirectoryNotEmptyException`s can be thrown for directories for which a 
delete call for every file has succeeded, since the OS may be slow in updating 
the actual FileSystem.
For this case we wait for a bit (50 milliseconds, yeah it's that slow) and 
check whether the directory is empty by now. If it isn't we throw the exception 
assuming a concurrent creation of a new file in the given directory.

Note that neither changes are perfect; they are not guaranteed to prevent 
the issue. However, it no longer categorically fails for me , but only once 
every few hundred runs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zentol/flink 5659_fileutils_win

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/3219.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #3219


commit f372899c7cd8f5102421b986be1f1b87e32cf15f
Author: zentol 
Date:   2017-01-26T14:47:11Z

[FLINK-5659] Harden FileBaseUtils#deleteFileOrDirectory on WIndows




> FileBaseUtils#deleteFileOrDirectory not thread-safe on Windows
> --
>
> Key: FLINK-5659
> URL: https://issues.apache.org/jira/browse/FLINK-5659
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Chesnay Schepler
>Assignee: Chesnay Schepler
>Priority: Trivial
>
> The {code}FileBaseUtils#deleteFileOrDirectory{code} is not thread-safe on 
> Windows.
> First you will run into AccessDeniedExceptions since one thread tried to 
> delete a file while another thread was already doing that, for which the file 
> has to be opened.
> Once you resolve those exceptions (by catching them double checking whether 
> the file still exists), you run into DirectoryNotEmptyExceptions since there 
> is some wacky timing/visibility issue when deleting files concurrently.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)