[jira] [Updated] (HDFS-11220) SnapshotDiffReport should detect open files in HDFS Snapshots

2017-08-25 Thread Manoj Govindassamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HDFS-11220:
--
Fix Version/s: 2.9.0

> SnapshotDiffReport should detect open files in HDFS Snapshots
> -
>
> Key: HDFS-11220
> URL: https://issues.apache.org/jira/browse/HDFS-11220
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
> Fix For: 2.9.0, 3.0.0-beta1
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in 
> parallel, Snapshots do capture all these files, but these being written files 
> in Snapshots do not have the point-in-time file length captured. Most of the 
> times, these open files will have a length of 0, or the last block boundary 
> size.
> 2. Only at the time of File close or any other meta data modification 
> operation on these files, HDFS reconciles the file length and records the 
> modification in the last taken Snapshot. All the previously taken Snapshots 
> continue to have those open Files with no modification recorded. So, all 
> those previous snapshots end up using the final modification record in the 
> next available snapshot. So, after the file close, file lengths in all those 
> snapshots will end up same.
> Assume File1 is opened for write and a total of 1MB written to it. While the 
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1---T2-T3T4-->
> |---Snap1--Snap2-Snap3--->
> |---File1.open---write-write---close->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> 
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> So, Snapshot Diff Report running against any of above snapshots will not 
> detect any delta changes in the open files. 
> *Proposal:*
> 1. HDFS Snapshots can stash open file details in the snapshot record. 
> 2. NameNode might not have the accurate byte level length visibility on the 
> open files, Snapshots might not have the accurate point-in-time length 
> captured. So, SnapshotDiffReport can have an option to detect open files and 
> always show {{M}} flag for the open files, if the files are available on both 
> the snapshots it is running against with. 
> {noformat}
> hdfs snapshotDiff -includeOpenFiles   
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-11220) SnapshotDiffReport should detect open files in HDFS Snapshots

2016-12-07 Thread Manoj Govindassamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-11220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Manoj Govindassamy updated HDFS-11220:
--
Description: 
*Problem:*

1. When there are files being written and when HDFS Snapshots are taken in 
parallel, Snapshots do capture all these files, but these being written files 
in Snapshots do not have the point-in-time file length captured. Most of the 
times, these open files will have a length of 0, or the last block boundary 
size.

2. Only at the time of File close or any other meta data modification operation 
on these files, HDFS reconciles the file length and records the modification in 
the last taken Snapshot. All the previously taken Snapshots continue to have 
those open Files with no modification recorded. So, all those previous 
snapshots end up using the final modification record in the next available 
snapshot. So, after the file close, file lengths in all those snapshots will 
end up same.

Assume File1 is opened for write and a total of 1MB written to it. While the 
writes are happening, snapshots are taken in parallel.

{noformat}
|---Time---T1---T2-T3T4-->
|---Snap1--Snap2-Snap3--->
|---File1.open---write-write---close->
{noformat}

Then at time,
T2:
Snap1.File1.length = 0

T3:
Snap1.File1.length = 0
Snap2.File1.length = 0



T4:
Snap1.File1.length = 1MB
Snap2.File1.length = 1MB
Snap3.File1.length = 1MB

So, Snapshot Diff Report running against any of above snapshots will not detect 
any delta changes in the open files. 

*Proposal:*

1. HDFS Snapshots can stash open file details in the snapshot record. 
2. NameNode might not have the accurate byte level length visibility on the 
open files, Snapshots might not have the accurate point-in-time length 
captured. So, SnapshotDiffReport can have an option to detect open files and 
always show {{M}} flag for the open files, if the files are available on both 
the snapshots it is running against with. 

{noformat}
hdfs snapshotDiff -includeOpenFiles   
{noformat}

  was:
*Problem:*

1. When there are files being written and when HDFS Snapshots are taken in 
parallel, Snapshots do capture all these files, but these being written files 
in Snapshots do not have the point-in-time file length captured. Most of the 
times, these open files will have a length of 0, or the last block boundary 
size.

2. Only at the time of File close or any other meta data modification operation 
on these files, HDFS reconciles the file length and records the modification in 
the last taken Snapshot. All the previously taken Snapshots continue to have 
those open Files with no modification recorded. So, all those previous 
snapshots end up using the final modification record in the next available 
snapshot. So, after the file close, file lengths in all those snapshots will 
end up same.

Assume File1 is opened for write and a total of 1MB written to it. While the 
writes are happening, snapshots are taken in parallel.

{noformat}
|---Time---T1---T2-T3T4-->
|---Snap1--Snap2-Snap3--->
|---File1.open---write-write---close->
{noformat}

Then at time,
T2:
Snap1.File1.length = 0

T3:
Snap1.File1.length = 0
Snap2.File1.length = 0



T4:
Snap1.File1.length = 1MB
Snap2.File1.length = 1MB
Snap3.File1.length = 1MB

So, Snapshot Diff Report running against any of above snapshots will not detect 
any delta changes in the open files. 

*Proposal:*

1. HDFS Snapshots can stash open file details in the snapshot record. 
2. NameNode might not have the accurate byte level length visibility on the 
open files, Snapshots might not have the accurate point-in-time length 
captured. So, SnapshotDiffReport can always show {{M}} flag for the open files, 
if the files are available on both the snapshots it is running against with. 


> SnapshotDiffReport should detect open files in HDFS Snapshots
> -
>
> Key: HDFS-11220
> URL: https://issues.apache.org/jira/browse/HDFS-11220
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 3.0.0-alpha1
>Reporter: Manoj Govindassamy
>Assignee: Manoj Govindassamy
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in 
> parallel, Snapshots do capture all these files, but these being written files 
> in Snapshots do not have the point-in-time file length captured. Most of the 
> times, these open files will have a length of 0, or the last block boundary 
> size.
> 2. Only at the time of File close or any other meta data modification 
> operation on these files, HDFS reconciles the file length and records the 
> mo