[ 
https://issues.apache.org/jira/browse/HDFS-11218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15796734#comment-15796734
 ] 

Manoj Govindassamy commented on HDFS-11218:
-------------------------------------------

We are working on design proposals.  One of the proposals that I am pondering 
on is to tweak the snapshot record and make it believe the open and being 
written files never got created when the snapshot was taken with skipOpenFiles 
option. But, this brings in more problems for File Append, File truncate cases 
and for all previous snapshots. The other proposal is to add extra information 
to Snapshot records for the open files which can be used later when run along 
with SnapshotDiff command (refer: HDFS-11220). Once the full proposals are 
ready, will share it here for public review and then work on patch based on 
review comments.

> Add option to skip open files during HDFS Snapshots
> ---------------------------------------------------
>
>                 Key: HDFS-11218
>                 URL: https://issues.apache.org/jira/browse/HDFS-11218
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: snapshots
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>
> *Problem:* 
> When there are files being written and when HDFS Snapshots are taken in 
> parallel,  Snapshots do capture all these files, but these being written 
> files in Snapshots do not have the point-in-time file length captured.
> At the time of File close or any other meta data modification operation on 
> that file which was previously open, HDFS reconciles the file length and 
> records the modification in the last taken Snapshot. All the previously taken 
> Snapshots continue to have the same open File with no modification recorded. 
> So, all those previous snapshots end up using the final modification record 
> in the next available snapshot.
> *Proposal:*
> HDFS Snapshot Design goal was to have O(M) space usage for Snapshots, where M 
> is the number file modifications. So, it would very expensive to record 
> modifications for all the open files in all the snapshots. For applications 
> that do not want to capture incomplete / partial being written binary files 
> in the snapshots, it would be preferable to have an extra option to skip open 
> files. This way, they don't have to worry about restoring inconsistent files 
> from the snapshots. 
> {noformat}
> hdfs dfs -createSnapshot -skipOpenFiles <snapDir> <snapName>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to