[ 
https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774370#comment-17774370
 ] 

Shuyan Zhang edited comment on HDFS-17218 at 10/12/23 8:21 AM:
---------------------------------------------------------------

Hi, [~haiyang Hu] , your report is very valuable. 

I think the root cause here is that NameNode has no timeout mechanism for 
handling excess replicas, just like PendingReconstructionMonitor in processing 
block reconstruction. 

 

 


was (Author: zhangshuyan):
Hi, [~haiyang Hu] , your report is very valuable. I would like to discuss it 
with you.

As you say, 
{quote}since block1 is not a new block, the processExtraRedundancy logic will 
not be executed.
{quote}
Therefore, even if we remove corresponding excess blocks from the 
ExcessRedundancyMap when a DN registers, it seems that we cannot avoid this 
problem because will still not be executed.

I think the root cause here is that NameNode has no timeout mechanism for 
handling excess replicas, just like PendingReconstructionMonitor in processing 
block reconstruction. 

 

 

> NameNode should remove its excess blocks from the ExcessRedundancyMap When a 
> DN registers
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-17218
>                 URL: https://issues.apache.org/jira/browse/HDFS-17218
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namanode
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2023-10-12-15-52-52-336.png
>
>
> Currently found that DN will lose all pending DNA_INVALIDATE blocks if it 
> restarts.
> *Root case*
> Current DN enables asynchronously deletion, it have many pending deletion 
> blocks in memory.
> when DN restarts, these cached blocks may be lost. it causes some blocks in 
> the excess map in the namenode to be leaked and this will result in many 
> blocks having more replicas then expected.
> *solution*
> Consider NameNode should remove its excess blocks from the 
> ExcessRedundancyMap When a DN registers,
> this approach will ensure that when processing the DN's full block report, 
> the 'processExtraRedundancy' can be performed according to the actual of the 
> blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to