[ 
https://issues.apache.org/jira/browse/HADOOP-17611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319355#comment-17319355
 ] 

Ayush Saxena commented on HADOOP-17611:
---------------------------------------

{quote}Is that supposed to be preserved by distcp?
{quote}
If I remember correct, I think there is an option in distcp as part of 
preserve, Guess it is {{TIMES}}, Check in {{DistCpOptions.java}} and there the 
FileAttribute. So, If that is specified then it does a setTimes as part of 
DistCpUtils#preserve. For what all directories/files it does will depend on the 
scope of copy, say what all is there in sequence file generated for copying, if 
the parent is there in the scope, it will preserve, else it won't AFAIK.

Give a check to that code, should clarify your doubts, and I just gave a quick 
look to the PR, Guess you are preserving irrespective of the {{TIMES}} option 
provided, double check once, if that is so, I didn't go through it.

Let me know, if you face any issues understanding the flow or need some help, I 
will also try to explore this and help. :)

> Distcp parallel file copy breaks the modification time
> ------------------------------------------------------
>
>                 Key: HADOOP-17611
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17611
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Adam Maroti
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The commit HADOOP-11794. Enable distcp to copy blocks in parallel. 
> (bf3fb585aaf2b179836e139c041fc87920a3c886) broke the modification time of 
> large files.
>  
> In CopyCommitter.java inside concatFileChunks Filesystem.concat is called 
> which changes the modification time therefore the modification times of files 
> copeid by distcp will not match the source files. However this only occurs 
> for large enough files, which are copied by splitting them up by distcp.
> In concatFileChunks before calling concat extract the modification time and 
> apply that to the concatenated result-file after the concat. (probably best 
> -after- before the rename()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to