[ 
https://issues.apache.org/jira/browse/HDFS-16774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haiyang Hu updated HDFS-16774:
------------------------------
    Description: 
In our online cluster, a large number of ReplicaNotFoundExceptions occur when 
client reads the data.
After tracing the root cause, it is caused by the asynchronous deletion of the 
block operation has  many stacked pending deletion  caused 
ReplicationNotFoundException.
Current the asynchronous deletion of the block operation process is as follows:
1.remove the block from the ReplicaMap
2.delete the block file on the disk [blocked in threadpool]
3.notifying namenode through IBR [blocked in threadpool]

In order to avoid similar problems as much as possible, consider optimizing the 
execution flow:
The deleting block from ReplicaMap, deleting block from disk and notifying 
namenode through IBR are processed in the same asynchronous thread.

> Improve async delete replica on datanode
> ----------------------------------------
>
>                 Key: HDFS-16774
>                 URL: https://issues.apache.org/jira/browse/HDFS-16774
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>
> In our online cluster, a large number of ReplicaNotFoundExceptions occur when 
> client reads the data.
> After tracing the root cause, it is caused by the asynchronous deletion of 
> the block operation has  many stacked pending deletion  caused 
> ReplicationNotFoundException.
> Current the asynchronous deletion of the block operation process is as 
> follows:
> 1.remove the block from the ReplicaMap
> 2.delete the block file on the disk [blocked in threadpool]
> 3.notifying namenode through IBR [blocked in threadpool]
> In order to avoid similar problems as much as possible, consider optimizing 
> the execution flow:
> The deleting block from ReplicaMap, deleting block from disk and notifying 
> namenode through IBR are processed in the same asynchronous thread.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to