[ https://issues.apache.org/jira/browse/HADOOP-16047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16742302#comment-16742302 ]
Andrew Olson commented on HADOOP-16047: --------------------------------------- Thanks [~ste...@apache.org] > Avoid expensive rename when DistCp is writing to S3 > --------------------------------------------------- > > Key: HADOOP-16047 > URL: https://issues.apache.org/jira/browse/HADOOP-16047 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3, tools/distcp > Reporter: Andrew Olson > Priority: Major > > When writing to an S3-based target, the temp file and rename logic in > RetriableFileCopyCommand adds some unnecessary cost to the job, as the rename > operation does a server-side copy + delete in S3 [1]. The renames are > parallelized across all of the DistCp map tasks, so the severity is mitigated > to some extent. However a configuration property to conditionally allow > distributed copies to avoid that expense and write directly to the target > path would improve performance considerably. > [1] > https://github.com/apache/hadoop/blob/release-3.2.0-RC1/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/introduction.md#object-stores-vs-filesystems -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org