[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zheng Shao updated MAPREDUCE-6840:
----------------------------------
    Attachment: MAPREDUCE-6840.1.patch

> Distcp to support cutoff time
> -----------------------------
>
>                 Key: MAPREDUCE-6840
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6840
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: distcp
>    Affects Versions: 2.6.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>            Priority: Minor
>         Attachments: MAPREDUCE-6840.1.patch
>
>
> To ensure consistency in the datasets on HDFS,  some projects like file 
> formats on Hive do HDFS operations in a particular order.  For example, if a 
> file format uses an index file, a new version of the index file will only be 
> written to HDFS after all files mentioned by the index are written to HDFS.
> When we do distcp, it's important to preserve that consistency, so that we 
> don't break those file formats.
> A typical solution for that is to create a HDFS Snapshot beforehand, and only 
> distcp the Snapshot.  That could work well if the user has superuser 
> privilege to make the directory snapshottable.
> If not, then it will be beneficial to have a cutoff time for distcp, so that 
> distcp only copy files modified on/before that cutoff time.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to