[jira] [Commented] (HADOOP-13600) S3a rename() to copy files in a directory in parallel

Steve Loughran (JIRA) Mon, 28 Nov 2016 07:13:03 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702187#comment-15702187
 ]


Steve Loughran commented on HADOOP-13600:
-----------------------------------------

# S3A instrumentation should a gauge of pending copy transfers; it'd be 
incremented during queue submit, decremented on success/failure of the copy. If 
there was a separate "active copy" gauge we could even distinguish 
copy-in-progress for copy-waiting-for threads.
# We could also actually include the size of the file being copied, as it will 
come from the list/getFileStatus calls. I think I'd like to see more debug 
level logging too; maybe something in innerRename() to actually log the entire 
duration and effective bandwidth of the call. I'd certainly like to know that.
# the package scoped inner rename mentioned above could also benefit from 
knowing file count, total size of the rename. It may want to log that at INFO, 
irrespective of what S3A does. Why? Answers that support call "why does the 
committer take so long at the end"?

> S3a rename() to copy files in a directory in parallel
> -----------------------------------------------------
>
>                 Key: HADOOP-13600
>                 URL: https://issues.apache.org/jira/browse/HADOOP-13600
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 2.7.3
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> Currently a directory rename does a one-by-one copy, making the request 
> O(files * data). If the copy operations were launched in parallel, the 
> duration of the copy may be reducable to the duration of the longest copy. 
> For a directory with many files, this will be significant



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Commented] (HADOOP-13600) S3a rename() to copy files in a directory in parallel

Reply via email to