[ https://issues.apache.org/jira/browse/HDFS-12638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16245134#comment-16245134 ]
Konstantin Shvachko commented on HDFS-12638: -------------------------------------------- [~wweic] {{addDeleteBlock()}} is supposed to be called, when the block is really intended to be deleted, that is it is not contained in any snapshots. I checked the code, there is a lot of logic around detecting which blocks belong or not to a snapshot, see e.g. {{INodeFile.collectBlocksBeyondSnapshot()}}. This makes it safe to delete the {{truncateBlock}}. Unless you have a test case as a counterexample. Did some digging and now understand why we don't see this in 2.7.4. The following line was introduced into {{addDeleteBlock()}} by HDFS-9754: {code} assert toDelete != null : "toDelete is null"; + toDelete.delete(); toDeleteList.add(toDelete); {code} which sets {{Block.bcId = INVALID_INODE_ID}}. I think this was the wrong place to invalidate bcId, as [I mensioned earlier|https://issues.apache.org/jira/browse/HDFS-12638?focusedCommentId=16214120&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16214120]. [~jingzhao] could you please take a look. > NameNode exits due to ReplicationMonitor thread received Runtime exception in > ReplicationWork#chooseTargets > ----------------------------------------------------------------------------------------------------------- > > Key: HDFS-12638 > URL: https://issues.apache.org/jira/browse/HDFS-12638 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs > Affects Versions: 2.8.2 > Reporter: Jiandan Yang > Attachments: HDFS-12638-branch-2.8.2.001.patch, HDFS-12638.002.patch, > OphanBlocksAfterTruncateDelete.jpg > > > Active NamNode exit due to NPE, I can confirm that the BlockCollection passed > in when creating ReplicationWork is null, but I do not know why > BlockCollection is null, By view history I found > [HDFS-9754|https://issues.apache.org/jira/browse/HDFS-9754] remove judging > whether BlockCollection is null. > NN logs are as following: > {code:java} > 2017-10-11 16:29:06,161 ERROR [ReplicationMonitor] > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: > ReplicationMonitor thread received Runtime exception. > java.lang.NullPointerException > at > org.apache.hadoop.hdfs.server.blockmanagement.ReplicationWork.chooseTargets(ReplicationWork.java:55) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWorkForBlocks(BlockManager.java:1532) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeReplicationWork(BlockManager.java:1491) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.computeDatanodeWork(BlockManager.java:3792) > at > org.apache.hadoop.hdfs.server.blockmanagement.BlockManager$ReplicationMonitor.run(BlockManager.java:3744) > at java.lang.Thread.run(Thread.java:834) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org