[jira] [Comment Edited] (HADOOP-17611) Distcp parallel file copy breaks the modification time

Adam Maroti (Jira) Mon, 12 Apr 2021 01:04:09 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-17611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17319129#comment-17319129
 ]


Adam Maroti edited comment on HADOOP-17611 at 4/12/21, 8:03 AM:
----------------------------------------------------------------

[~vjasani] Concat creates a new file right? Or deletes some other files? Does 
that change the directories modification time? Is that supposed to be preserved 
by distcp?


was (Author: amaroti):
[~vjasani] Concat creates a new file right? Does that change the directories 
modification time? Is that supposed to be preserved by distcp?

> Distcp parallel file copy breaks the modification time
> ------------------------------------------------------
>
>                 Key: HADOOP-17611
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17611
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Adam Maroti
>            Assignee: Viraj Jasani
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The commit HADOOP-11794. Enable distcp to copy blocks in parallel. 
> (bf3fb585aaf2b179836e139c041fc87920a3c886) broke the modification time of 
> large files.
>  
> In CopyCommitter.java inside concatFileChunks Filesystem.concat is called 
> which changes the modification time therefore the modification times of files 
> copeid by distcp will not match the source files. However this only occurs 
> for large enough files, which are copied by splitting them up by distcp.
> In concatFileChunks before calling concat extract the modification time and 
> apply that to the concatenated result-file after the concat. (probably best 
> -after- before the rename()).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

[jira] [Comment Edited] (HADOOP-17611) Distcp parallel file copy breaks the modification time

Reply via email to