[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14945939#comment-14945939 ]
Arpit Agarwal commented on HDFS-4015: ------------------------------------- Hi [~anu], thanks for addressing the earlier feedback. Feedback on the v2 patch. # We will likely see blocks with future generation stamps during HDFS rollback. We should disable this check if NN has been restarted with a rollback option (either regular or rolling upgrade rollback). # I apologize for not noticing this earlier. {{FsStatus}} is tagged as public and stable, so changing the constructor signature is incompatible. Instead we could add a new constructor that initializes . This will also avoid changes to FileSystem, ViewFS, RawLocalFileSystem # fsck should also print this new counter. We can do it in a separate Jira. # Don't consider this a binding but I would really like it if {{bytesInFuture}} can be renamed especially where it is exposed via public interfaces/metrics. It sounds confusing/ominous. {{bytesWithFutureGenerationStamps}} would be more precise. Still reviewing the test cases. > Safemode should count and report orphaned blocks > ------------------------------------------------ > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 3.0.0 > Reporter: Todd Lipcon > Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "900000 of expected 1000000 > blocks have been reported. Additionally, 10000 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)