[ https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774370#comment-17774370 ]
Shuyan Zhang edited comment on HDFS-17218 at 10/12/23 8:21 AM: --------------------------------------------------------------- Hi, [~haiyang Hu] , your report is very valuable. I think the root cause here is that NameNode has no timeout mechanism for handling excess replicas, just like PendingReconstructionMonitor in processing block reconstruction. was (Author: zhangshuyan): Hi, [~haiyang Hu] , your report is very valuable. I would like to discuss it with you. As you say, {quote}since block1 is not a new block, the processExtraRedundancy logic will not be executed. {quote} Therefore, even if we remove corresponding excess blocks from the ExcessRedundancyMap when a DN registers, it seems that we cannot avoid this problem because will still not be executed. I think the root cause here is that NameNode has no timeout mechanism for handling excess replicas, just like PendingReconstructionMonitor in processing block reconstruction. > NameNode should remove its excess blocks from the ExcessRedundancyMap When a > DN registers > ----------------------------------------------------------------------------------------- > > Key: HDFS-17218 > URL: https://issues.apache.org/jira/browse/HDFS-17218 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode > Reporter: Haiyang Hu > Assignee: Haiyang Hu > Priority: Major > Labels: pull-request-available > Attachments: image-2023-10-12-15-52-52-336.png > > > Currently found that DN will lose all pending DNA_INVALIDATE blocks if it > restarts. > *Root case* > Current DN enables asynchronously deletion, it have many pending deletion > blocks in memory. > when DN restarts, these cached blocks may be lost. it causes some blocks in > the excess map in the namenode to be leaked and this will result in many > blocks having more replicas then expected. > *solution* > Consider NameNode should remove its excess blocks from the > ExcessRedundancyMap When a DN registers, > this approach will ensure that when processing the DN's full block report, > the 'processExtraRedundancy' can be performed according to the actual of the > blocks. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org