[ https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17773348#comment-17773348 ]
Haiyang Hu commented on HDFS-17218: ----------------------------------- Hi Sir [~hexiaoqiao] thanks for you comment. The problem currently encountered is this process: 1.block1 of dn1 is chosen as excess, added to excessRedundancyMap and add To Invalidates. 2.dn1 heartbeat gets Invalidates command. 3.dn1 will execute async deletion when receive commands, but before it is actually deleted, the service stop, so the block1 still exsit. 4.at this time, nn's excessRedundancyMap will still have the block of dn1 5. restart the dn, at this time nn has not determined that the dn is in a dead state. 6. dn restarts will FBR is executed (processFirstBlockReport will not be executed here, processReport will be executed). since block1 is not a new block, the processExtraRedundancy logic will not be executed. 7. so the block of dn1 will always exist in excessRedundancyMap (until HA switch is performed). > NameNode should remove its excess blocks from the ExcessRedundancyMap When a > DN registers > ----------------------------------------------------------------------------------------- > > Key: HDFS-17218 > URL: https://issues.apache.org/jira/browse/HDFS-17218 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode > Reporter: Haiyang Hu > Assignee: Haiyang Hu > Priority: Major > > Currently found that DN will lose all pending DNA_INVALIDATE blocks if it > restarts. > *Root case* > Current DN enables asynchronously deletion, it have many pending deletion > blocks in memory. > when DN restarts, these cached blocks may be lost. it causes some blocks in > the excess map in the namenode to be leaked and this will result in many > blocks having more replicas then expected. > *solution* > Consider NameNode should remove its excess blocks from the > ExcessRedundancyMap When a DN registers, > this approach will ensure that when processing the DN's full block report, > the 'processExtraRedundancy' can be performed according to the actual of the > blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org