[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-4015: Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: 2.8.0 Status: Resolved (was: Patch Available) Committed to branch-2 for 2.8.0. Thanks for contributing this improvement [~anu], and thanks for the reviews [~liuml07] and [~jnp]. > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Fix For: 2.8.0 > > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Agarwal updated HDFS-4015: Attachment: HDFS-4015.007.patch Attached v7 patch with the trivial edit to fix {{TestHDFSCLI}}. The change looks good otherwise. +1 with the fix for the test case, pending Jenkins. > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch, HDFS-4015.007.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: HDFS-4015.006.patch [~arpitagarwal] [~liuml07] This patch fixes the issue where Administrator enters or leaves safe mode. Now in the exit path we don't check if we are in the Startup mode > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch, > HDFS-4015.006.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: HDFS-4015.005.patch Hi [~arpitagarwal], Thanks for the review. Good catch on (newer client + older namenode). The new patch fixes that and also updates how RollBack is detected based on off-line comments. > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch, HDFS-4015.005.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: HDFS-4015.004.patch > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: (was: HDFS-4015.004.patch) > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: HDFS-4015.004.patch update documentation > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch, HDFS-4015.004.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: HDFS-4015.003.patch [~arpitagarwal] Thanks for the review. I have fixed all issues flagged by you. bq. We will likely see blocks with future generation stamps during intentional HDFS rollback. We should disable this check if NN has been restarted with a rollback option (either regular or rolling upgrade rollback). Fixed this by setting shouldPostponeBlocksFromFuture in rollback path. bq. I apologize for not noticing this earlier. FsStatus is tagged as public and stable, so changing the constructor signature is incompatible. Instead we could add a new constructor that initializes bytesInFuture. This will also avoid changes to FileSystem, ViewFS, RawLocalFileSystem. Thanks for catching this, I really appreciate it. I added a function in Distributed file system that returns this value instead of modifying FsStatus. bq. fsck should also print this new counter. We can do it in a separate Jira. Sure as soon as this JIRA is committed I will follow up with a JIRA and patch for that. bq. Don't consider this a binding but I would really like it if bytesInFuture can be renamed especially where it is exposed via public interfaces/metrics. It sounds confusing/ominous. bytesWithFutureGenerationStamps would be more precise. fixed - now counter looks like this via JMX -"BytesWithFutureGenerationStamps" : 1174853312, > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > HDFS-4015.003.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: HDFS-4015.002.patch re-based the patch to top of the tree, used same patch number > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: (was: HDFS-4015.002.patch) > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: (was: dfsAdmin-report_with_forceExit.png) > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: (was: dfsHealth.html.message.png) > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Status: Open (was: Patch Available) > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > dfsAdmin-report_with_forceExit.png, dfsHealth.html.message.png > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Status: Patch Available (was: Open) > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > dfsAdmin-report_with_forceExit.png, dfsHealth.html.message.png > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: HDFS-4015.002.patch Hi [~liuml07] [~arpitagarwal], Thanks for your reviews. I have fixed all issues mentioned by both of you in this new patch. Please take a look when you get a chance > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, HDFS-4015.002.patch, > dfsAdmin-report_with_forceExit.png, dfsHealth.html.message.png > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: HDFS-4015.001.patch > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, > dfsHealth.html.message.png > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Status: Open (was: Patch Available) > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, > dfsHealth.html.message.png > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: (was: HDFS-4015.001.patch) > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: dfsAdmin-report_with_forceExit.png, > dfsHealth.html.message.png > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Status: Patch Available (was: Open) > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, > dfsHealth.html.message.png > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Status: Patch Available (was: Open) > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon >Assignee: Anu Engineer > Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, > dfsHealth.html.message.png > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HDFS-4015) Safemode should count and report orphaned blocks
[ https://issues.apache.org/jira/browse/HDFS-4015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anu Engineer updated HDFS-4015: --- Attachment: dfsAdmin-report_with_forceExit.png dfsHealth.html.message.png HDFS-4015.001.patch Changes in this patch are: *NameNode Changes:* # Today we ignore blocks that does not belong to any file, instead of just ignoring those blocks NN checks if any block has generation stamps in future and keep track of those. # While leaving safe mode NN will refuse to leave if HDFS has blocks that are in future. # Exposed BytesInFuture as a JMX value in case hadoop management tools wants to look for this. # Added a new mode to exit safe mode called forceExit. *Changes in DfsAdmin:* # Changed -report to not only detect we are in safe mode, but if we have bytes in future, an appropriate warning is printed. # Supported a new command extension to -safemode called forceExit to indicate that user is ok with losing data and allows namenode to exit safe mode. *Changes in DfsHealth.html:* # Will show modified message that relates to blocks having future generation stamps. *Test Changes:* # Created a test that simulates the namenode meta-data being replaced and data nodes reporting in blocks with generation stamps in future. Also attached the screen shots of how this change will appear to users. > Safemode should count and report orphaned blocks > > > Key: HDFS-4015 > URL: https://issues.apache.org/jira/browse/HDFS-4015 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.0.0 >Reporter: Todd Lipcon > Attachments: HDFS-4015.001.patch, dfsAdmin-report_with_forceExit.png, > dfsHealth.html.message.png > > > The safemode status currently reports the number of unique reported blocks > compared to the total number of blocks referenced by the namespace. However, > it does not report the inverse: blocks which are reported by datanodes but > not referenced by the namespace. > In the case that an admin accidentally starts up from an old image, this can > be confusing: safemode and fsck will show "corrupt files", which are the > files which actually have been deleted but got resurrected by restarting from > the old image. This will convince them that they can safely force leave > safemode and remove these files -- after all, they know that those files > should really have been deleted. However, they're not aware that leaving > safemode will also unrecoverably delete a bunch of other block files which > have been orphaned due to the namespace rollback. > I'd like to consider reporting something like: "90 of expected 100 > blocks have been reported. Additionally, 1 blocks have been reported > which do not correspond to any file in the namespace. Forcing exit of > safemode will unrecoverably remove those data blocks" > Whether this statistic is also used for some kind of "inverse safe mode" is > the logical next step, but just reporting it as a warning seems easy enough > to accomplish and worth doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)