[ https://issues.apache.org/jira/browse/HDFS-2770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13182377#comment-13182377 ]
Todd Lipcon commented on HDFS-2770: ----------------------------------- I believe the issue may be with any place we check: {code} // Ignore replicas already scheduled to be removed from the DN if(invalidateBlocks.contains(dn.getStorageID(), block)) { {code} since it is ignoring the fact that, after the replication monitor thread has run, the block is no longer in {{BlockManager.invalidateBlocks}}, but instead in that DatanodeDescriptor's {{invalidateBlocks}} list. Maybe someone can remind me why we even have two separate invalidateBlocks structures in the first place? (one global map keyed by StorageID and another per-datanode list) > Block reports may mark corrupt blocks pending deletion as non-corrupt > --------------------------------------------------------------------- > > Key: HDFS-2770 > URL: https://issues.apache.org/jira/browse/HDFS-2770 > Project: Hadoop HDFS > Issue Type: Bug > Components: name-node > Affects Versions: 0.23.0 > Reporter: Todd Lipcon > Priority: Critical > > It seems like HDFS-900 may have regressed in trunk since it was committed > without a regression test. In HDFS-2742 I saw the following sequence of > events: > - A block at replication 2 had one of its replicas marked as corrupt on the NN > - NN scheduled deletion of that replica in {{invalidateWork}}, and removed it > from the block map > - The DN hosting that block sent a block report, which caused the replica to > get re-added to the block map as if it were good > - The deletion request was passed to the DN and it deleted the block > - Now we're in a bad state, where the NN temporarily thinks that it has two > good replicas, but in fact one of them has been deleted. If we lower > replication of this block at this time, the one good remaining replica may be > deleted. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira