Standby namenode stuck in safenode during a failover ----------------------------------------------------
Key: HDFS-2753 URL: https://issues.apache.org/jira/browse/HDFS-2753 Project: Hadoop HDFS Issue Type: Bug Components: name-node Affects Versions: HA branch (HDFS-1623) Reporter: Hari Mankude Assignee: Hari Mankude teragen is run to generate write traffic. A manual failover is initiated by killing namenode process. namenode which was killed is then restarted and it comes up as a standby. However, the standby never exits out of the safemode. Looking at the standby logs, it looks like the standby namenode gets addStoredBlock immediately after data node registration which results in numBlocks getting incremented. There is an optimization in processReport() which causes the follow on complete block report to be ignored when NN is in safemode. So, NN never exits out of the safemode. 2012-01-05 18:57:46,030 INFO hdfs.StateChange (DatanodeManager.java:registerDatanode(573)) - BLOCK* NameSystem.registerDatanode: node registration from 98.137.233.235:50010 storage DS-526656430-98.137.233.235-50010-1325723536492 2012-01-05 18:57:46,033 INFO net.NetworkTopology (NetworkTopology.java:add(344)) - Adding a new node: /default-rack/98.137.233.235:50010 2012-01-05 18:57:46,033 INFO namenode.FSNamesystem (FSNamesystem.java:checkMode(3411)) - DID NOT call initialize at 1 2012-01-05 18:57:46,034 INFO hdfs.StateChange (DatanodeManager.java:registerDatanode(573)) - BLOCK* NameSystem.registerDatanode: node registration from 98.137.233.237:50010 storage DS-1961520590-98.137.233.237-50010-1325725253057 2012-01-05 18:57:46,034 INFO net.NetworkTopology (NetworkTopology.java:add(344)) - Adding a new node: /default-rack/98.137.233.237:50010 2012-01-05 18:57:46,042 INFO namenode.FSNamesystem (FSNamesystem.java:checkMode(3411)) - DID NOT call initialize at 1 2012-01-05 18:57:46,045 INFO hdfs.StateChange (BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap updated: 98.137.233.235:50010 is added to blk_-3183325095022454724_1172{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED]]} size 0 2012-01-05 18:57:46,046 INFO hdfs.StateChange (BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap updated: 98.137.233.235:50010 is added to blk_5617057825952660916_1173{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED]]} size 0 2012-01-05 18:57:46,046 INFO hdfs.StateChange (BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap updated: 98.137.233.237:50010 is added to blk_-3183325095022454724_1172{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED], ReplicaUnderConstruction[98.137.233.237:50010|FINALIZED]]} size 0 2012-01-05 18:57:46,046 INFO hdfs.StateChange (BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap updated: 98.137.233.237:50010 is added to blk_5617057825952660916_1173{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED], ReplicaUnderConstruction[98.137.233.237:50010|FINALIZED]]} size 0 2012-01-05 18:57:46,049 INFO hdfs.StateChange (BlockManager.java:processReport(1365)) - BLOCK* processReport: discarded non-initial block report from 98.137.233.235:50010 because namenode still in startup phase 2012-01-05 18:57:46,049 INFO hdfs.StateChange (BlockManager.java:processReport(1365)) - BLOCK* processReport: discarded non-initial block report from 98.137.233.237:50010 because namenode still in startup phase 2012-01-05 18:58:05,167 INFO namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1175, Queuing blockReceivedAndDeleted message 2012-01-05 18:58:05,168 INFO namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1175, Queuing blockReceivedAndDeleted message 2012-01-05 18:58:06,634 INFO namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1176, Queuing blockReceivedAndDeleted message 2012-01-05 18:58:06,636 INFO namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1176, Queuing blockReceivedAndDeleted message 2012-01-05 18:58:08,097 INFO namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1177, Queuing blockReceivedAndDeleted message 2012-01-05 18:58:08,097 INFO namenode.NameNode (NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1177, Queuing blockReceivedAndDeleted message -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira