[jira] [Commented] (HADOOP-13600) S3a rename() to copy files in a directory in parallel

ASF GitHub Bot (JIRA) Thu, 24 Nov 2016 07:22:14 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15693525#comment-15693525
 ]


ASF GitHub Bot commented on HADOOP-13600:
-----------------------------------------

GitHub user steveloughran opened a pull request:

    https://github.com/apache/hadoop/pull/167

    HADOOP-13600 

    starting on parallel rename, still designing code for max parallelism. Even 
listing and delete calls should be in parallel threads. Really only need to be 
collecting at the same rate as copies, which is implicitly defined by the rate 
of keys added to a delete queue


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/steveloughran/hadoop s3/HADOOOP-13600-rename

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hadoop/pull/167.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #167
    
----
commit 00a0b79481cced4def8734f1aadfb94ef315d737
Author: Steve Loughran <ste...@apache.org>
Date:   2016-11-10T10:26:34Z

    HADOOP-13600 starting on parallel rename, still designing code for max 
parallelism. Even listing and delete calls should be in parallel threads. 
Indeed: listing could consider doing a pre-emptive call to grab all of the 
list, though for a bucket with a few million files this would be too expensive. 
Really only need to be collecting at the same rate as copies, which is 
implicitly defined by the rate of keys added to a delete queue
    
    Change-Id: I906a1a15f3a7567cbff1999236549627859319a5

----


> S3a rename() to copy files in a directory in parallel
> -----------------------------------------------------
>
>                 Key: HADOOP-13600
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13600
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> Currently a directory rename does a one-by-one copy, making the request 
> O(files * data). If the copy operations were launched in parallel, the 
> duration of the copy may be reducable to the duration of the longest copy. 
> For a directory with many files, this will be significant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13600) S3a rename() to copy files in a directory in parallel

Reply via email to