[
https://issues.apache.org/jira/browse/HDFS-11220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Manoj Govindassamy resolved HDFS-11220.
---------------------------------------
Resolution: Workaround
Fix Version/s: 3.0.0-beta1
The core issue described in the jira is not a problem any more. With the fix
for HDFS-11402, we have a workaround to capture immutable copies of open files
in the snapshots.
> SnapshotDiffReport should detect open files in HDFS Snapshots
> -------------------------------------------------------------
>
> Key: HDFS-11220
> URL: https://issues.apache.org/jira/browse/HDFS-11220
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: snapshots
> Affects Versions: 3.0.0-alpha1
> Reporter: Manoj Govindassamy
> Assignee: Manoj Govindassamy
> Fix For: 3.0.0-beta1
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in
> parallel, Snapshots do capture all these files, but these being written files
> in Snapshots do not have the point-in-time file length captured. Most of the
> times, these open files will have a length of 0, or the last block boundary
> size.
> 2. Only at the time of File close or any other meta data modification
> operation on these files, HDFS reconciles the file length and records the
> modification in the last taken Snapshot. All the previously taken Snapshots
> continue to have those open Files with no modification recorded. So, all
> those previous snapshots end up using the final modification record in the
> next available snapshot. So, after the file close, file lengths in all those
> snapshots will end up same.
> Assume File1 is opened for write and a total of 1MB written to it. While the
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1-----------T2-------------T3----------------T4------>
> |-----------------------Snap1----------Snap2-------------Snap3--->
> |---File1.open---write---------write-----------close------------->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> <File1 write completed and closed>
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> So, Snapshot Diff Report running against any of above snapshots will not
> detect any delta changes in the open files.
> *Proposal:*
> 1. HDFS Snapshots can stash open file details in the snapshot record.
> 2. NameNode might not have the accurate byte level length visibility on the
> open files, Snapshots might not have the accurate point-in-time length
> captured. So, SnapshotDiffReport can have an option to detect open files and
> always show {{M}} flag for the open files, if the files are available on both
> the snapshots it is running against with.
> {noformat}
> hdfs snapshotDiff -includeOpenFiles <snapDir> <snapName> <snapName>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]