[ https://issues.apache.org/jira/browse/HDFS-8258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14516168#comment-14516168 ]
Masatake Iwasaki commented on HDFS-8258: ---------------------------------------- Thanks for reporting this, [~hwbj]. I thinks HDFS-7225 is fixes for this. > namenode shutdown strangely > ---------------------------- > > Key: HDFS-8258 > URL: https://issues.apache.org/jira/browse/HDFS-8258 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: wei.he > > Hi > I use the hadoop2.5. > The standby namenode shutdown after restart one of DNs . And I got the > error with tail last logs. > ....................................... > 2015-04-23 15:37:39,133 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2343)) - BLOCK* addStoredBlock: blockMap > updated: 192.168.146.223:50010 is added to > blk_1277475690_203782236{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-5dc915b5-a44d-441c-b960-87e875edb5a8:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-50f65630-347e-4e32-b3b9-9eb904db9577:NORMAL|RBW]]} > size 0 > 2015-04-23 15:37:39,137 INFO hdfs.StateChange > (DatanodeManager.java:registerDatanode(873)) - BLOCK* registerDatanode: from > DatanodeRegistration(192.168.146.130, > datanodeUuid=ee834f1d-ba78-48de-bd7a-c364b67b535f, infoPort=50075, > ipcPort=8010, > storageInfo=lv=-55;cid=CID-64dc0d1c-f525-432b-9b28-2b92262d6111;nsid=740344496;c=0) > storage ee834f1d-ba78-48de-bd7a-c364b67b535f > 2015-04-23 15:37:39,138 INFO namenode.NameNode > (DatanodeManager.java:registerDatanode(881)) - BLOCK* registerDatanode: > 192.168.146.130:50010 > 2015-04-23 15:37:39,261 INFO net.NetworkTopology > (NetworkTopology.java:remove(482)) - Removing a node: > /hadoop2.0/rack_/YSC801/D7_2_5/192.168.146.130:50010 > 2015-04-23 15:37:39,262 INFO net.NetworkTopology > (NetworkTopology.java:add(413)) - Adding a new node: > /hadoop2.0/rack_/YSC801/D7_2_5/192.168.146.130:50010 > 2015-04-23 15:37:39,264 WARN namenode.FSNamesystem > (FSNamesystem.java:getCorruptFiles(6775)) - Get corrupt file blocks returned > error: Operation category READ is not supported in state standby > 2015-04-23 15:37:39,264 FATAL blockmanagement.BlockManager > (BlockManager.java:run(3390)) - ReplicationMonitor thread received Runtime > exception. > java.lang.NullPointerException > at java.util.TreeMap.getEntry(TreeMap.java:342) > at java.util.TreeMap.get(TreeMap.java:273) > at > org.apache.hadoop.hdfs.server.blockmanagement.InvalidateBlocks.invalidateWork(InvalidateBlocks.java:137) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.invalidateWorkForOneNode(BlockManager.java:3231) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeInvalidateWork(BlockManager.java:1191) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3431) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3375) > at java.lang.Thread.run(Thread.java:744) > 2015-04-23 15:37:39,267 INFO blockmanagement.DatanodeDescriptor > (DatanodeDescriptor.java:updateStorage(667)) - Adding new storage ID > DS-f0b04209-2f6a-491b-9f28-173c4c53d364 for DN 192.168.146.130:50010 > 2015-04-23 15:37:39,268 INFO blockmanagement.DatanodeDescriptor > (DatanodeDescriptor.java:updateStorage(667)) - Adding new storage ID > DS-d255d5a4-4543-4621-b258-4c575843f29c for DN 192.168.146.130:50010 > 2015-04-23 15:37:39,268 INFO blockmanagement.DatanodeDescriptor > (DatanodeDescriptor.java:updateStorage(667)) - Adding new storage ID > DS-4a88b36b-7ae6-4f30-b95c-0c4e47d70878 for DN 192.168.146.130:50010 > 2015-04-23 15:37:39,268 INFO blockmanagement.DatanodeDescriptor > (DatanodeDescriptor.java:updateStorage(667)) - Adding new storage ID > DS-d4166bd7-a8c0-4067-8c68-78c6c31dcd9e for DN 192.168.146.130:50010 > 2015-04-23 15:37:39,268 INFO blockmanagement.DatanodeDescriptor > (DatanodeDescriptor.java:updateStorage(667)) - Adding new storage ID > DS-468eeca6-e45c-428f-811a-71c5d1f04a9f for DN 192.168.146.130:50010 > 2015-04-23 15:37:39,269 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2343)) - BLOCK* addStoredBlock: blockMap > updated: 192.168.146.34:50010 is added to blk_1253664895_179969194 size > 285087186 > 2015-04-23 15:37:39,271 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2343)) - BLOCK* addStoredBlock: blockMap > updated: 192.168.146.210:50010 is added to > blk_1277475689_203782235{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-2feb6f2f-9d30-4edd-b3c3-101194c6bde8:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ca6ef4dc-be77-4a27-95e1-39b4fb53933a:NORMAL|RBW]]} > size 0 > 2015-04-23 15:37:39,274 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2343)) - BLOCK* addStoredBlock: blockMap > updated: 192.168.146.191:50010 is added to > blk_1277475690_203782236{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-5dc915b5-a44d-441c-b960-87e875edb5a8:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-50f65630-347e-4e32-b3b9-9eb904db9577:NORMAL|RBW]]} > size 0 > 2015-04-23 15:37:39,277 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2343)) - BLOCK* addStoredBlock: blockMap > updated: 192.168.146.34:50010 is added to blk_1083102318_9365917 size > 134217728 > 2015-04-23 15:37:39,277 INFO BlockStateChange > (BlockManager.java:logAddStoredBlock(2343)) - BLOCK* addStoredBlock: blockMap > updated: 192.168.146.191:50010 is added to > blk_1277475689_203782235{blockUCState=UNDER_CONSTRUCTION, > primaryNodeIndex=-1, > replicas=[ReplicaUnderConstruction[[DISK]DS-2feb6f2f-9d30-4edd-b3c3-101194c6bde8:NORMAL|RBW], > > ReplicaUnderConstruction[[DISK]DS-ca6ef4dc-be77-4a27-95e1-39b4fb53933a:NORMAL|RBW]]} > size 0 > 2015-04-23 15:37:39,299 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - > Exiting with status 1 > 2015-04-23 15:37:39,301 INFO namenode.NameNode (StringUtils.java:run(640)) - > SHUTDOWN_MSG: > /************************************************************ > SHUTDOWN_MSG: Shutting down NameNode at hadoop14667/192.168.146.67 > ************************************************************/ > .......................................................................... > --LOG END-- > > I tracked the log and looked at the InvalidateBlocks.invalidateWork > method. > ... > synchronized List<Block> invalidateWork(final DatanodeDescriptor dn) { > final long delay = getInvalidationDelay(); > if (delay > 0) { > if (BlockManager.LOG.isDebugEnabled()) { > BlockManager.LOG > .debug("Block deletion is delayed during NameNode startup. " > + "The deletion will start after " + delay + " ms."); > } > return null; > } > final LightWeightHashSet<Block> set = node2blocks.get(dn); > if (set == null) { > return null; > } > // # blocks that can be sent in one message is limited > final int limit = blockInvalidateLimit; > final List<Block> toInvalidate = set.pollN(limit); > // If we send everything in this message, remove this node entry > if (set.isEmpty()) { > remove(dn); > } > dn.addBlocksToBeInvalidated(toInvalidate); > numBlocks -= toInvalidate.size(); > return toInvalidate; > } > ... > i noticed this line . > final LightWeightHashSet<Block> set = node2blocks.get(dn); > does not check the null at the variable of dn first. so when dn is null , > occured the NullPointerException. > ReplicationMonitor thread monitors the replication state periodically . > Unfortunatelly i stop a datanode with the concurrence of ReplicationMonitor > run. So treemap.get(null) throw NullPointerException. > I am not sure whether is a bug. > If yes, can i fixed this issue, i check the dn null value before > treemap.get(dn)? > > I hope that is helpful. -- This message was sent by Atlassian JIRA (v6.3.4#6332)