[ https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17776063#comment-17776063 ]
ASF GitHub Bot commented on HDFS-17218: --------------------------------------- zhangshuyan0 commented on code in PR #6176: URL: https://github.com/apache/hadoop/pull/6176#discussion_r1361730711 ########## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java: ########## @@ -1007,6 +1013,7 @@ public void updateRegInfo(DatanodeID nodeReg) { for(DatanodeStorageInfo storage : getStorageInfos()) { if (storage.getStorageType() != StorageType.PROVIDED) { storage.setBlockReportCount(0); + storage.setBlockContentsStale(true); Review Comment: > namenode thinks that this DN contains this block, but actually the DN doesn't store this block Actually, this situation can happen at any time, not just between "registerDataNode" and "blockReport". Why do you think that after the DN is re-registered, the probability of the above situation happening will increase, and it needs to be dealt with specifically? IMO, the concept stale was born to mark replicas that may have been deleted by NN commands but do not exist in the ExcessRedundancyMap. It is closely related to the state of ExcessRedundancyMap. - After the failover, there is no data in the ExcessRedundancyMap of the new ANN, so all datanodes need to be marked as stale. - In this patch, the ExcessRedundancyMap of the corresponding DN is cleared when re-registered, so it needs to be set to stale. If the data in the ExcessRedundancyMap is correct, it means that NN clearly knows which replicas are about to be deleted, which ensures that NN will not actively delete all replicas of a block. Looking forward to your reply @ZanderXu . > NameNode should remove its excess blocks from the ExcessRedundancyMap When a > DN registers > ----------------------------------------------------------------------------------------- > > Key: HDFS-17218 > URL: https://issues.apache.org/jira/browse/HDFS-17218 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode > Reporter: Haiyang Hu > Assignee: Haiyang Hu > Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-12-15-52-52-336.png > > > Currently found that DN will lose all pending DNA_INVALIDATE blocks if it > restarts. > *Root case* > Current DN enables asynchronously deletion, it have many pending deletion > blocks in memory. > when DN restarts, these cached blocks may be lost. it causes some blocks in > the excess map in the namenode to be leaked and this will result in many > blocks having more replicas then expected. > *solution* > Consider NameNode should remove its excess blocks from the > ExcessRedundancyMap When a DN registers, > this approach will ensure that when processing the DN's full block report, > the 'processExtraRedundancy' can be performed according to the actual of the > blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org