[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Status: Open (was: Patch Available) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Attachment: HDFS-4832.branch-0.23.patch The patch ported to trunk Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Status: Patch Available (was: Open) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Attachment: HDFS-4832.patch The patch for trunk and branch-2 Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Status: Open (was: Patch Available) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Status: Patch Available (was: Open) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Attachment: HDFS-4832.patch Y u no test my patch Hadoop QA? Uploading the same patch. Maybe this time it will get picked up Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kihwal Lee updated HDFS-4832: - Resolution: Fixed Fix Version/s: 0.23.9 2.1.0-beta 3.0.0 Release Note: This change makes name node keep its internal replication queues and data node state updated in manual safe mode. This allows metrics and UI to present up-to-date information while in safe mode. The behavior during start-up safe mode is unchanged. Hadoop Flags: Reviewed Status: Resolved (was: Patch Available) I've committed this to trunk, branch-2, branch-2.1.0-beta, and branch-0.23. Thanks for working on this patch, Ravi. Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Fix For: 3.0.0, 2.1.0-beta, 0.23.9 Attachments: HDFS-4832.branch-0.23.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Status: Open (was: Patch Available) The patch passed test-patch.sh on my machine several times. Rolling the dice again. Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Status: Patch Available (was: Open) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.7, 3.0.0, 2.1.0-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Attachment: HDFS-4832.patch Hmm funny. Eclipse ran the test fine and passed, but the same test failed when run from the command line. :( Anyway. I've fixed the test so it passes both, in eclipse as well as on the command line Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Attachment: HDFS-4832.patch Thanks for your review Kihwal. I've updated the patch. bq. isInStartupSafeMode() returns true for any auto safe mode. E.g. if the resource checker puts NN in safe mode, it will return true. I have filed HDFS-4862 to fix this. The method name is unfortunately contrary to its behavior. {quote} The existing code drained scheduled work in safe mode, but the patch makes it immediately stops sending scheduled work to DNs. This seems correct behavior for safe mode, but those work can be sent out after leaving safe mode. That may not be ideal. E.g. if NN is suffering from a flaky DNS, DNs will appear dead, come back and dead again, generating a lot of invalidation and replication work. Admins may put NN in safe mode to safely pass the storm. When they do, the unnecessary work needs to stop rather than being delayed. Please make sure unintended damage does not occur after leaving safe mode. {quote} UnderReplicatedBlocks is the priority queue maintained for neededReplications, and it is updated when nodes join or are marked dead. However, once BlockManager.computeReplicationWorkForBlocks is called, the ReplicationWork is transferred to the DatanodeDescriptor's replicateBlocks queue, from which it will not be rescinded. The computeReplicationWorkForBlocks() is called every replicationRecheckInterval which defaults to 3 seconds. Can we please handle this in a separate JIRA? Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch, HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Attachment: HDFS-4832.patch Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch, HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Summary: Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave (was: Namenode doesn't change the number of missing blocks in safemode when DNs rejoin) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HDFS-4832) Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave
[ https://issues.apache.org/jira/browse/HDFS-4832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravi Prakash updated HDFS-4832: --- Description: Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. was: Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. Namenode doesn't change the number of missing blocks in safemode when DNs rejoin or leave - Key: HDFS-4832 URL: https://issues.apache.org/jira/browse/HDFS-4832 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 3.0.0, 0.23.7, 2.0.5-beta Reporter: Ravi Prakash Assignee: Ravi Prakash Priority: Critical Attachments: HDFS-4832.patch Courtesy Karri VRK Reddy! {quote} 1. Namenode lost datanodes causing missing blocks 2. Namenode was put in safe mode 3. Datanode restarted on dead nodes 4. Waited for lots of time for the NN UI to reflect the recovered blocks. 5. Forced NN out of safe mode and suddenly, no more missing blocks anymore. {quote} I was able to replicate this on 0.23 and trunk. I set dfs.namenode.heartbeat.recheck-interval to 1 and killed the DN to simulate lost datanode. The opposite case also has problems (i.e. Datanode failing when NN is in safemode, doesn't lead to a missing blocks message) Without the NN updating this list of missing blocks, the grid admins will not know when to take the cluster out of safemode. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira