[ 
https://issues.apache.org/jira/browse/HADOOP-17611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319400#comment-17319400
 ] 

Viraj Jasani commented on HADOOP-17611:
---------------------------------------

{quote}Guess you are preserving irrespective of the {{TIMES}} option provided, 
double check once, if that is so, I didn't go through it.
{quote}
Yes, this change is specifically for distcp with parallel blocks copy option 
only. Because for very big file, we copy multiple blocks in parallel and in 
concat(), blocks are appended to target file block and hence, target file's 
mtime changes due to concat, which we are trying to retain as part of this 
change. Hence, this won't still retain mtime and atime exactly same as source 
files.
{quote}If I remember correct, I think there is an option in distcp as part of 
preserve, Guess it is {{TIMES}}, Check in {{DistCpOptions.java}} and there the 
FileAttribute.
{quote}
Yeah, just explored it. This option does preserve source files mtime and atime 
attributes.

DistCpUtils#preserve() has this condition to retain times:
{code:java}
if (attributes.contains(FileAttribute.TIMES)) {
  targetFS.setTimes(path, 
      srcFileStatus.getModificationTime(), 
      srcFileStatus.getAccessTime());
}

{code}

> Distcp parallel file copy breaks the modification time
> ------------------------------------------------------
>
>                 Key: HADOOP-17611
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17611
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Adam Maroti
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The commit HADOOP-11794. Enable distcp to copy blocks in parallel. 
> (bf3fb585aaf2b179836e139c041fc87920a3c886) broke the modification time of 
> large files.
>  
> In CopyCommitter.java inside concatFileChunks Filesystem.concat is called 
> which changes the modification time therefore the modification times of files 
> copeid by distcp will not match the source files. However this only occurs 
> for large enough files, which are copied by splitting them up by distcp.
> In concatFileChunks before calling concat extract the modification time and 
> apply that to the concatenated result-file after the concat. (probably best 
> -after- before the rename()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to