[ 
https://issues.apache.org/jira/browse/HADOOP-16776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17039572#comment-17039572
 ] 

Amir Shenavandeh commented on HADOOP-16776:
-------------------------------------------

This is can happen often depending on how big the objects ( the task will move 
multiple objects fast )  are. If too many small objects, it can happen often.  
This is also impacting HIve as Hive uses DistCp to copy the final results. 
There is no option in hive to disable DistCp as far as  I understood, only 
thresholds. In EMR we can use s3-dist-cp instead of DistCp and as a workaround, 
we might introduce an interface into Hive to plugin third party object transfer 
tools, like s3-dist-cp in EMR. For S3, it is very inefficient to  use DistCp or 
s3-dist-cp for copy from s3 to s3. It could be done with a server side copy. So 
there is room for 3rd party copy tools.

 

 

> backport HADOOP-16775: distcp copies to s3 are randomly corrupted
> -----------------------------------------------------------------
>
>                 Key: HADOOP-16776
>                 URL: https://issues.apache.org/jira/browse/HADOOP-16776
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: tools/distcp
>    Affects Versions: 2.8.0, 3.0.0, 2.10.0
>            Reporter: Amir Shenavandeh
>            Priority: Blocker
>              Labels: DistCp
>         Attachments: HADOOP-16776-branch-2.8-001.patch, 
> HADOOP-16776-branch-2.8-002.patch
>
>
> This is to back port HADOOP-16775 to hadoop 2.8 branch.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to