[ https://issues.apache.org/jira/browse/HDFS-11220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manoj Govindassamy resolved HDFS-11220. --------------------------------------- Resolution: Workaround Fix Version/s: 3.0.0-beta1 The core issue described in the jira is not a problem any more. With the fix for HDFS-11402, we have a workaround to capture immutable copies of open files in the snapshots. > SnapshotDiffReport should detect open files in HDFS Snapshots > ------------------------------------------------------------- > > Key: HDFS-11220 > URL: https://issues.apache.org/jira/browse/HDFS-11220 > Project: Hadoop HDFS > Issue Type: Improvement > Components: snapshots > Affects Versions: 3.0.0-alpha1 > Reporter: Manoj Govindassamy > Assignee: Manoj Govindassamy > Fix For: 3.0.0-beta1 > > > *Problem:* > 1. When there are files being written and when HDFS Snapshots are taken in > parallel, Snapshots do capture all these files, but these being written files > in Snapshots do not have the point-in-time file length captured. Most of the > times, these open files will have a length of 0, or the last block boundary > size. > 2. Only at the time of File close or any other meta data modification > operation on these files, HDFS reconciles the file length and records the > modification in the last taken Snapshot. All the previously taken Snapshots > continue to have those open Files with no modification recorded. So, all > those previous snapshots end up using the final modification record in the > next available snapshot. So, after the file close, file lengths in all those > snapshots will end up same. > Assume File1 is opened for write and a total of 1MB written to it. While the > writes are happening, snapshots are taken in parallel. > {noformat} > |---Time---T1-----------T2-------------T3----------------T4------> > |-----------------------Snap1----------Snap2-------------Snap3---> > |---File1.open---write---------write-----------close-------------> > {noformat} > Then at time, > T2: > Snap1.File1.length = 0 > T3: > Snap1.File1.length = 0 > Snap2.File1.length = 0 > <File1 write completed and closed> > T4: > Snap1.File1.length = 1MB > Snap2.File1.length = 1MB > Snap3.File1.length = 1MB > So, Snapshot Diff Report running against any of above snapshots will not > detect any delta changes in the open files. > *Proposal:* > 1. HDFS Snapshots can stash open file details in the snapshot record. > 2. NameNode might not have the accurate byte level length visibility on the > open files, Snapshots might not have the accurate point-in-time length > captured. So, SnapshotDiffReport can have an option to detect open files and > always show {{M}} flag for the open files, if the files are available on both > the snapshots it is running against with. > {noformat} > hdfs snapshotDiff -includeOpenFiles <snapDir> <snapName> <snapName> > {noformat} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org