Daisuke Kobayashi created HADOOP-16756:
------------------------------------------

             Summary: Inconsistent Behavior on distcp -update over S3
                 Key: HADOOP-16756
                 URL: https://issues.apache.org/jira/browse/HADOOP-16756
             Project: Hadoop Common
          Issue Type: Bug
    Affects Versions: 3.3.0
            Reporter: Daisuke Kobayashi


Distcp over S3A always copies all source files no matter the files are changed 
or not. This is opposite to the statement in the doc below.

[http://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html]
{noformat}
And to use -update to only copy changed files.
{noformat}
CopyMapper compares file length as well as block size before copying. While the 
file length should match, the block size does not. This is apparently because 
the returned block size from S3A is always 32MB.

[https://github.com/apache/hadoop/blob/release-3.2.0-RC1/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/CopyMapper.java#L348]

I'd suppose we should update the documentation or make code change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to