[ 
https://issues.apache.org/jira/browse/HDFS-14621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ludun updated HDFS-14621:
-------------------------
    Description: 
Use distcp with  -prbugpcaxt and -delete to copy data between cluster.

hadoop distcp -Dmapreduce.job.queuename="QueueA" -prbugpcaxt -update -delete  
hdfs://sourcecluster/user/hive/warehouse/sum.db 
hdfs://destcluster/user/hive/warehouse/sum.db

After distcp, we found  the timestamp of dest is different from source, and the 
timestamp of some directory was the time distcp running.

Check the code of distcp, in CopyCommitter, it preserves time first then 
process -delete option which will change the timestamp of dest directory. So we 
should process -delete option first. 

 

  was:
Use distcp with  -prbugpcaxt and -delete to copy data between cluster.

hadoop distcp -Dmapreduce.job.queuename="QueueA" -prbugpcaxt -update -delete  
hdfs://sourcecluster/user/hive/warehouse/sum.db 
hdfs://destcluster/user/hive/warehouse/sum.db

After distcp, we found  the timestamp of dest is different from source, and the 
timestamp of some directory was the time distcp running.

Check the code of distcp, in committer, it preserves time first then process 
-delete option which will change the timestamp of dest directory. So we should 
process -delete option first. 

 


> Distcp can not preserve timestamp with -delete  option
> ------------------------------------------------------
>
>                 Key: HDFS-14621
>                 URL: https://issues.apache.org/jira/browse/HDFS-14621
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: distcp
>    Affects Versions: 2.7.7, 3.1.2
>            Reporter: ludun
>            Priority: Major
>         Attachments: HDFS-14261.001.patch
>
>
> Use distcp with  -prbugpcaxt and -delete to copy data between cluster.
> hadoop distcp -Dmapreduce.job.queuename="QueueA" -prbugpcaxt -update -delete  
> hdfs://sourcecluster/user/hive/warehouse/sum.db 
> hdfs://destcluster/user/hive/warehouse/sum.db
> After distcp, we found  the timestamp of dest is different from source, and 
> the timestamp of some directory was the time distcp running.
> Check the code of distcp, in CopyCommitter, it preserves time first then 
> process -delete option which will change the timestamp of dest directory. So 
> we should process -delete option first. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to