[ 
https://issues.apache.org/jira/browse/HDFS-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17606292#comment-17606292
 ] 

ASF GitHub Bot commented on HDFS-16774:
---------------------------------------

haiyang1987 opened a new pull request, #4903:
URL: https://github.com/apache/hadoop/pull/4903

   
   ### Description of PR
   HDFS-16774.Improve async delete replica on datanode
   
   In our online cluster, a large number of ReplicaNotFoundExceptions occur 
when client reads the data.
   After tracing the root cause, it is caused by the asynchronous deletion of 
the replica operation has many stacked pending deletion caused 
ReplicationNotFoundException.
   Current the asynchronous delete of the replica operation process is as 
follows:
   1.remove the replica from the ReplicaMap
   2.delete the replica file on the disk [blocked in threadpool]
   3.notifying namenode through IBR [blocked in threadpool]
   
   In order to avoid similar problems as much as possible, consider optimizing 
the execution flow:
   The deleting replica from ReplicaMap, deleting replica from disk and 
notifying namenode through IBR are processed in the same asynchronous thread.
   
   
   
   
   




> Improve async delete replica on datanode
> ----------------------------------------
>
>                 Key: HDFS-16774
>                 URL: https://issues.apache.org/jira/browse/HDFS-16774
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>
> In our online cluster, a large number of ReplicaNotFoundExceptions occur when 
> client reads the data.
> After tracing the root cause, it is caused by the asynchronous deletion of 
> the replica operation has  many stacked pending deletion  caused 
> ReplicationNotFoundException.
> Current the asynchronous delete of the replica operation process is as 
> follows:
> 1.remove the replica from the ReplicaMap
> 2.delete the replica file on the disk [blocked in threadpool]
> 3.notifying namenode through IBR [blocked in threadpool]
> In order to avoid similar problems as much as possible, consider optimizing 
> the execution flow:
> The deleting replica from ReplicaMap, deleting replica from disk and 
> notifying namenode through IBR are processed in the same asynchronous thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to