[ 
https://issues.apache.org/jira/browse/HDFS-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796535#comment-15796535
 ] 

churro morales commented on HDFS-11218:
---------------------------------------

This seems quite useful.  Are you guys working on this patch currently?

> Add option to skip open files during HDFS Snapshots
> ---------------------------------------------------
>
>                 Key: HDFS-11218
>                 URL: https://issues.apache.org/jira/browse/HDFS-11218
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: snapshots
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>
> *Problem:* 
> When there are files being written and when HDFS Snapshots are taken in 
> parallel,  Snapshots do capture all these files, but these being written 
> files in Snapshots do not have the point-in-time file length captured.
> At the time of File close or any other meta data modification operation on 
> that file which was previously open, HDFS reconciles the file length and 
> records the modification in the last taken Snapshot. All the previously taken 
> Snapshots continue to have the same open File with no modification recorded. 
> So, all those previous snapshots end up using the final modification record 
> in the next available snapshot.
> *Proposal:*
> HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M 
> is the number file modifications. So, it would very expensive to record 
> modifications for all the open files in all the snapshots. For applications 
> that do not want to capture incomplete / partial being written binary files 
> in the snapshots, it would be preferable to have an extra option to skip open 
> files. This way, they don't have to worry about restoring inconsistent files 
> from the snapshots. 
> {noformat}
> hdfs dfs -createSnapshot -skipOpenFiles <snapDir> <snapName>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to