[GitHub] flink issue #2970: [FLINK-5300] Add more gentle file deletion procedure

2016-12-09 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2970
  
I like the idea.
I am wondering how expensive getting the array of `FileStatus` for all 
files in the directory is. HDFS in Hadoop 2 has the option to get a 
`ContentSummary` that has the number of files in a directory. I assume that 
this is more lightweight.

We could extend Flink's FileSystem class to also offer something like that 
and then use that method.

If we decide to not do that, it would be good to put the repeated logic for 
"delete if empty" into a utility function.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2970: [FLINK-5300] Add more gentle file deletion procedure

2016-12-09 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2970
  
I like the idea of not listing the status for all contained files. However, 
I've looked at the implementation of Hadoop's `FileSystem#getContentSummary` 
and `FileSystem#listLocatedStatus` and both implementations call 
`FileSystem#listStatus`. Thus, unless this changes in the future, we wouldn't 
win a lot by calling the `getContentSummary` instead (actually we would have 
the overhead of aggregating the different `FileStatus` objects).

Therefore, I'll refactor the code and add a 
`FileUtils#deleteDirectoryIfEmpty` method which will encapsulate the logic. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2970: [FLINK-5300] Add more gentle file deletion procedure

2016-12-09 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2970
  
I've update this PR @StephanEwen. Unfortunately, I couldn't use Hadoop's 
`FileSystem#getContentSummary` because it will first request the status for the 
given path, then list all files and directories if the path is a directory. For 
each file it will aggregate the `FileStatus` and then recursively descend into 
each directory. Thus, I think that this method is not faster.

I've refactored the code to contain a method `FileUtils#deletePathIfEmpty` 
to delete the path if it does not contain any files/directories.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2970: [FLINK-5300] Add more gentle file deletion procedure

2016-12-12 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2970
  
Rebasing on the latest master. @StephanEwen since I couldn't find a more 
efficient way to list the directory contents (wrt Hadoop FS) than `listStatus`, 
I think we can merge this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2970: [FLINK-5300] Add more gentle file deletion procedure

2016-12-12 Thread StephanEwen
Github user StephanEwen commented on the issue:

https://github.com/apache/flink/pull/2970
  
Looks good to me. I would actually suggest to add two tests, one in 
`flink-core` based on the local file system, and one in `flink-fs-tests`, based 
on HDFS.
That way we make sure that there are no "unexpected behaviors", like some 
default file status always included (`.` or `..` or whatever).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink issue #2970: [FLINK-5300] Add more gentle file deletion procedure

2016-12-12 Thread tillrohrmann
Github user tillrohrmann commented on the issue:

https://github.com/apache/flink/pull/2970
  
True. Will add the tests and then merge the PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---