[ 
https://issues.apache.org/jira/browse/HDFS-17218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17774805#comment-17774805
 ] 

Haiyang Hu edited comment on HDFS-17218 at 10/13/23 6:14 AM:
-------------------------------------------------------------

{quote}
IMO, adding a timeout mechanism may not add much pressure on NameNode. However, 
it seems that the implementation of that solution is more complex than the 
current patch and requires more comprehensive design and consideration. The 
good aspect is that the timeout mechanism can completely solve the problem of 
excess replica leakage, after all, the situation where datanodes fail to 
successfully delete replicas according to commands may not be limited to the 
scenario described in this JIRA.
{quote}
Thanks [~zhangshuyan] for the detailed reply, Indeed,  this is a more 
comprehensive solution to solve the problem of excess replica leakage.  If 
necessary, can we create an issue to follow up on this work?


was (Author: haiyang hu):
{quote}
IMO, adding a timeout mechanism may not add much pressure on NameNode. However, 
it seems that the implementation of that solution is more complex than the 
current patch and requires more comprehensive design and consideration. The 
good aspect is that the timeout mechanism can completely solve the problem of 
excess replica leakage, after all, the situation where datanodes fail to 
successfully delete replicas according to commands may not be limited to the 
scenario described in this JIRA.
{quote}
Thanks [~zhangshuyan] for the detailed reply, Indeed,  this is a more 
comprehensive solution to solve the problem of excess replica leakage. 

> NameNode should remove its excess blocks from the ExcessRedundancyMap When a 
> DN registers
> -----------------------------------------------------------------------------------------
>
>                 Key: HDFS-17218
>                 URL: https://issues.apache.org/jira/browse/HDFS-17218
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namanode
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: image-2023-10-12-15-52-52-336.png
>
>
> Currently found that DN will lose all pending DNA_INVALIDATE blocks if it 
> restarts.
> *Root case*
> Current DN enables asynchronously deletion, it have many pending deletion 
> blocks in memory.
> when DN restarts, these cached blocks may be lost. it causes some blocks in 
> the excess map in the namenode to be leaked and this will result in many 
> blocks having more replicas then expected.
> *solution*
> Consider NameNode should remove its excess blocks from the 
> ExcessRedundancyMap When a DN registers,
> this approach will ensure that when processing the DN's full block report, 
> the 'processExtraRedundancy' can be performed according to the actual of the 
> blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to