[ 
https://issues.apache.org/jira/browse/HDFS-11402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15975800#comment-15975800
 ] 

Manoj Govindassamy edited comment on HDFS-11402 at 4/20/17 12:23 AM:
---------------------------------------------------------------------

Thanks for the review [~jojochuang] and [~andrew.wang]. Attached v05 patch 
after addressing following comments, please take a look.

* Refactored the config word 'freeze' with 'capture'
* Its just not the file length which is getting captured but the whole meta 
data like permissions, acls, etc., Also, this is only for open files. So, would 
prefer to have the word "openfiles" also in the config. Now it is named 
dfs.namenode.snapshot.capture.openfiles.
* Updated the comment in hdfs-default.xml about the special flag needed for 
hsync/hflush.
* the byte[][] is already available getINodesAndPaths() along with INode[]. 
Wouldn't it be a lot slower to have one more helper method to construct the 
same byte[][] ?
* Updated the comment for the new method fromINode()
* Updated isDescendant() javadoc comments.
* Clubbed isDescendant() methods together.
* Its callers duty to hold fsnamesystem to hold the read or write lock so that 
the lease file list doesn't change. Added the assert.
* v01 ptach didn't have the thread pool. Based on the comments from Yongjun and 
Yiqun that a lot of open files could lead to performance problems, I altered 
the model to use the thread pool. I did check with 10K+ open files and the 
thread pool is helping. But, for smaller open files count its almost same or 
little lesser. 
* Moved the exectuorService shutdown before future tasks get. But, I don't see 
a difference in shutdown() before or after the future get. Its anyway 
non-blocking and should not interrupt the currently running ones.
* future.get() can throw interrupted exception or the execution exception. 
Since task routine doesn't throw any exception, the only other is interrupted 
exception. Since we are not calling any shutdownNow(), there shouldn't be 
interrupted exception either. We are logging it anyways. Not sure if we want to 
fail the snapshot create operation if at all any exception happens here.
* Even if ancestorDir is null, we still want to convert INode to INodesInPath. 
Probably in the ancestor null case, we can do it without having to spin off a 
thread pool. But, there are no production code which will call with null 
ancestorDir.
* Sleep of 1 ms is to wait for the expiration of lease soft and hard limit. 
Added more comments in the code.


was (Author: manojg):
Thanks for the review [~jojochuang] and [~andrew.wang].

* Refactored the config word 'freeze' with 'capture'
* Its just not the file length which is getting captured but the whole meta 
data like permissions, acls, etc., Also, this is only for open files. So, would 
prefer to have the word "openfiles" also in the config. Now it is named 
dfs.namenode.snapshot.capture.openfiles.
* Updated the comment in hdfs-default.xml about the special flag needed for 
hsync/hflush.
* the byte[][] is already available getINodesAndPaths() along with INode[]. 
Wouldn't it be a lot slower to have one more helper method to construct the 
same byte[][] ?
* Updated the comment for the new method fromINode()
* Updated isDescendant() javadoc comments.
* Clubbed isDescendant() methods together.
* Its callers duty to hold fsnamesystem to hold the read or write lock so that 
the lease file list doesn't change. Added the assert.
* v01 ptach didn't have the thread pool. Based on the comments from Yongjun and 
Yiqun that a lot of open files could lead to performance problems, I altered 
the model to use the thread pool. I did check with 10K+ open files and the 
thread pool is helping. But, for smaller open files count its almost same or 
little lesser. 
* Moved the exectuorService shutdown before future tasks get. But, I don't see 
a difference in shutdown() before or after the future get. Its anyway 
non-blocking and should not interrupt the currently running ones.
* future.get() can throw interrupted exception or the execution exception. 
Since task routine doesn't throw any exception, the only other is interrupted 
exception. Since we are not calling any shutdownNow(), there shouldn't be 
interrupted exception either. We are logging it anyways. Not sure if we want to 
fail the snapshot create operation if at all any exception happens here.
* Even if ancestorDir is null, we still want to convert INode to INodesInPath. 
Probably in the ancestor null case, we can do it without having to spin off a 
thread pool. But, there are no production code which will call with null 
ancestorDir.
* Sleep of 1 ms is to wait for the expiration of lease soft and hard limit. 
Added more comments in the code.

> HDFS Snapshots should capture point-in-time copies of OPEN files
> ----------------------------------------------------------------
>
>                 Key: HDFS-11402
>                 URL: https://issues.apache.org/jira/browse/HDFS-11402
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 2.6.0
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>         Attachments: HDFS-11402.01.patch, HDFS-11402.02.patch, 
> HDFS-11402.03.patch, HDFS-11402.04.patch, HDFS-11402.05.patch
>
>
> *Problem:*
> 1. When there are files being written and when HDFS Snapshots are taken in 
> parallel, Snapshots do capture all these files, but these being written files 
> in Snapshots do not have the point-in-time file length captured. That is, 
> these open files are not frozen in HDFS Snapshots. These open files 
> grow/shrink in length, just like the original file, even after the snapshot 
> time.
> 2. At the time of File close or any other meta data modification operation on 
> these files, HDFS reconciles the file length and records the modification in 
> the last taken Snapshot. All the previously taken Snapshots continue to have 
> those open Files with no modification recorded. So, all those previous 
> snapshots end up using the final modification record in the last snapshot. 
> Thus after the file close, file lengths in all those snapshots will end up 
> same.
> Assume File1 is opened for write and a total of 1MB written to it. While the 
> writes are happening, snapshots are taken in parallel.
> {noformat}
> |---Time---T1-----------T2-------------T3----------------T4------>
> |-----------------------Snap1----------Snap2-------------Snap3--->
> |---File1.open---write---------write-----------close------------->
> {noformat}
> Then at time,
> T2:
> Snap1.File1.length = 0
> T3:
> Snap1.File1.length = 0
> Snap2.File1.length = 0
> <File1 write completed and closed>
> T4:
> Snap1.File1.length = 1MB
> Snap2.File1.length = 1MB
> Snap3.File1.length = 1MB
> *Proposal*
> 1. At the time of taking Snapshot, {{SnapshotManager#createSnapshot}} can 
> optionally request {{DirectorySnapshottableFeature#addSnapshot}} to freeze 
> open files. 
> 2. {{DirectorySnapshottableFeature#addSnapshot}} can consult with 
> {{LeaseManager}} and get a list INodesInPath for all open files under the 
> snapshot dir. 
> 3. {{DirectorySnapshottableFeature#addSnapshot}} after the Snapshot creation, 
> Diff creation and updating modification time, can invoke 
> {{INodeFile#recordModification}} for each of the open files. This way, the 
> Snapshot just taken will have a {{FileDiff}} with {{fileSize}} captured for 
> each of the open files. 
> 4. Above model follows the current Snapshot and Diff protocols and doesn't 
> introduce any any disk formats. So, I don't think we will be needing any new 
> FSImage Loader/Saver changes for Snapshots.
> 5. One of the design goals of HDFS Snapshot was ability to take any number of 
> snapshots in O(1) time. LeaseManager though has all the open files with 
> leases in-memory map, an iteration is still needed to prune the needed open 
> files and then run recordModification on each of them. So, it will not be a 
> strict O(1) with the above proposal. But, its going be a marginal increase 
> only as the new order will be of O(open_files_under_snap_dir). In order to 
> avoid HDFS Snapshots change in behavior for open files and avoid change in 
> time complexity, this improvement can be made under a new config 
> {{"dfs.namenode.snapshot.freeze.openfiles"}} which by default can be 
> {{false}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to