[ 
https://issues.apache.org/jira/browse/HDFS-11220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy resolved HDFS-11220.
---------------------------------------
       Resolution: Workaround
    Fix Version/s: 3.0.0-beta1

The core issue described in the jira is not a problem any more. With the fix 
for HDFS-11402, we have a workaround to capture immutable copies of open files 
in the snapshots. 

> SnapshotDiffReport should detect open files in HDFS Snapshots
> -------------------------------------------------------------
>
>                 Key: HDFS-11220
>                 URL: https://issues.apache.org/jira/browse/HDFS-11220
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: snapshots
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>             Fix For: 3.0.0-beta1
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in 
> parallel, Snapshots do capture all these files, but these being written files 
> in Snapshots do not have the point-in-time file length captured. Most of the 
> times, these open files will have a length of 0, or the last block boundary 
> size.
> 2. Only at the time of File close or any other meta data modification 
> operation on these files, HDFS reconciles the file length and records the 
> modification in the last taken Snapshot. All the previously taken Snapshots 
> continue to have those open Files with no modification recorded. So, all 
> those previous snapshots end up using the final modification record in the 
> next available snapshot. So, after the file close, file lengths in all those 
> snapshots will end up same.
> Assume File1 is opened for write and a total of 1MB written to it. While the 
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1-----------T2-------------T3----------------T4------>
> |-----------------------Snap1----------Snap2-------------Snap3--->
> |---File1.open---write---------write-----------close------------->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> <File1 write completed and closed>
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> So, Snapshot Diff Report running against any of above snapshots will not 
> detect any delta changes in the open files. 
> *Proposal:*
> 1. HDFS Snapshots can stash open file details in the snapshot record. 
> 2. NameNode might not have the accurate byte level length visibility on the 
> open files, Snapshots might not have the accurate point-in-time length 
> captured. So, SnapshotDiffReport can have an option to detect open files and 
> always show {{M}} flag for the open files, if the files are available on both 
> the snapshots it is running against with. 
> {noformat}
> hdfs snapshotDiff -includeOpenFiles <snapDir> <snapName> <snapName>
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to