[ https://issues.apache.org/jira/browse/HADOOP-13208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15425537#comment-15425537 ]
Hudson commented on HADOOP-13208: --------------------------------- SUCCESS: Integrated in Jenkins build Hadoop-trunk-Commit #10294 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/10294/]) HADOOP-13208. S3A listFiles(recursive=true) to do a bulk listObjects (cnauroth: rev 822d661b8fcc42bec6eea958d9fd02ef1aaa4b6c) * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/S3ATestUtils.java * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/contract/s3a/TestS3AContractGetFileStatus.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java * (add) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Listing.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AInstrumentation.java * (edit) hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Statistic.java * (edit) hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/scale/TestS3ADirectoryPerformance.java > S3A listFiles(recursive=true) to do a bulk listObjects instead of walking the > pseudo-tree of directories > -------------------------------------------------------------------------------------------------------- > > Key: HADOOP-13208 > URL: https://issues.apache.org/jira/browse/HADOOP-13208 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.8.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Minor > Fix For: 2.9.0 > > Attachments: HADOOP-13208-branch-2-001.patch, > HADOOP-13208-branch-2-007.patch, HADOOP-13208-branch-2-008.patch, > HADOOP-13208-branch-2-009.patch, HADOOP-13208-branch-2-010.patch, > HADOOP-13208-branch-2-011.patch, HADOOP-13208-branch-2-012.patch, > HADOOP-13208-branch-2-017.patch, HADOOP-13208-branch-2-018.patch, > HADOOP-13208-branch-2-019.patch, HADOOP-13208-branch-2-020.patch, > HADOOP-13208-branch-2-021.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > A major cost in split calculation against object stores turns out be listing > the directory tree itself. That's because against S3, it takes S3A two HEADs > and two lists to list the content of any directory path (2 HEADs + 1 list for > getFileStatus(); the next list to query the contents). > Listing a directory could be improved slightly by combining the final two > listings. However, a listing of a directory tree will still be > O(directories). In contrast, a recursive {{listFiles()}} operation should be > implementable by a bulk listing of all descendant paths; one List operation > per thousand descendants. > As the result of this call is an iterator, the ongoing listing can be > implemented within the iterator itself -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org