[ https://issues.apache.org/jira/browse/HADOOP-13600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15702187#comment-15702187 ]
Steve Loughran commented on HADOOP-13600: ----------------------------------------- # S3A instrumentation should a gauge of pending copy transfers; it'd be incremented during queue submit, decremented on success/failure of the copy. If there was a separate "active copy" gauge we could even distinguish copy-in-progress for copy-waiting-for threads. # We could also actually include the size of the file being copied, as it will come from the list/getFileStatus calls. I think I'd like to see more debug level logging too; maybe something in innerRename() to actually log the entire duration and effective bandwidth of the call. I'd certainly like to know that. # the package scoped inner rename mentioned above could also benefit from knowing file count, total size of the rename. It may want to log that at INFO, irrespective of what S3A does. Why? Answers that support call "why does the committer take so long at the end"? > S3a rename() to copy files in a directory in parallel > ----------------------------------------------------- > > Key: HADOOP-13600 > URL: https://issues.apache.org/jira/browse/HADOOP-13600 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 > Affects Versions: 2.7.3 > Reporter: Steve Loughran > Assignee: Steve Loughran > > Currently a directory rename does a one-by-one copy, making the request > O(files * data). If the copy operations were launched in parallel, the > duration of the copy may be reducable to the duration of the longest copy. > For a directory with many files, this will be significant -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org