[ https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693525#comment-15693525 ]
ASF GitHub Bot commented on HADOOP-13600: ----------------------------------------- GitHub user steveloughran opened a pull request: https://github.com/apache/hadoop/pull/167 HADOOP-13600 starting on parallel rename, still designing code for max parallelism. Even listing and delete calls should be in parallel threads. Really only need to be collecting at the same rate as copies, which is implicitly defined by the rate of keys added to a delete queue You can merge this pull request into a Git repository by running: $ git pull https://github.com/steveloughran/hadoop s3/HADOOOP-13600-rename Alternatively you can review and apply these changes as the patch at: https://github.com/apache/hadoop/pull/167.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #167 ---- commit 00a0b79481cced4def8734f1aadfb94ef315d737 Author: Steve Loughran <ste...@apache.org> Date: 2016-11-10T10:26:34Z HADOOP-13600 starting on parallel rename, still designing code for max parallelism. Even listing and delete calls should be in parallel threads. Indeed: listing could consider doing a pre-emptive call to grab all of the list, though for a bucket with a few million files this would be too expensive. Really only need to be collecting at the same rate as copies, which is implicitly defined by the rate of keys added to a delete queue Change-Id: I906a1a15f3a7567cbff1999236549627859319a5 ---- > S3a rename() to copy files in a directory in parallel > ----------------------------------------------------- > > Key: HADOOP-13600 > URL: https://issues.apache.org/jira/browse/HADOOP-13600 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.7.3 > Reporter: Steve Loughran > Assignee: Steve Loughran > > Currently a directory rename does a one-by-one copy, making the request > O(files * data). If the copy operations were launched in parallel, the > duration of the copy may be reducable to the duration of the longest copy. > For a directory with many files, this will be significant -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org