[GitHub] [hadoop] steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD.
steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD. URL: https://github.com/apache/hadoop/pull/1601#issuecomment-541796750 thx -merged This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD.
steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD. URL: https://github.com/apache/hadoop/pull/1601#issuecomment-541059565 updated the docs. The only place we don't do Head and dir marker is in create() Now. can you create a Path with a trailing / ? I was about to say no, but remembered https://issues.apache.org/jira/browse/HADOOP-15430 .. one of the constructors of Path does let you get away with it, which is something which breaks S3Guard already This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[GitHub] [hadoop] steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD.
steveloughran commented on issue #1601: HADOOP-16635. S3A innerGetFileStatus scans for directories-only still does a HEAD. URL: https://github.com/apache/hadoop/pull/1601#issuecomment-540571728 Sid, thanks for the comments, will review/update the patch Interesting point about the double list. This code path is how its always been, presumably descended from the s3n code. LIST is slower, costs more and much more prone to eventual consistency, which are all good arguments for HEAD first. I actually plan to tune some of the calls which always seem to get used on directory walks (listStatus, listFiles, listLocatedStatus) to do the subtree list first, and only go for the HEAD calls if they don't find any children. This is to reduce the cost of treewalks where the bias is towards populated directories This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org