[ 
https://issues.apache.org/jira/browse/HDFS-17191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri updated HDFS-17191:
-------------------------------
    Summary: Delete operation adds a thread to collect blocks asynchronously  
(was: HDFS: Delete operation adds a thread to collect blocks asynchronously)

> Delete operation adds a thread to collect blocks asynchronously
> ---------------------------------------------------------------
>
>                 Key: HDFS-17191
>                 URL: https://issues.apache.org/jira/browse/HDFS-17191
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs
>    Affects Versions: 3.4.0
>            Reporter: Xiangyi Zhu
>            Assignee: Xiangyi Zhu
>            Priority: Major
>
> When we delete a large directory, it is time-consuming to collect the blocks 
> in the deleted subtree. Currently, block collection is executed within a 
> write lock. If a large directory is deleted, other RPCs may be blocked for a 
> period of time. Asynchronous deletion of collected blocks has been 
> implemented, we can refer to this Jira 
> https://issues.apache.org/jira/browse/HDFS-16043.
> In fact, collecting blocks does not require locking, because after the 
> subtree is deleted, this subtree will not be accessed by other RPCs. We can 
> collect the deleted subtree asynchronously and without locking.
> But there may be some problems:
> 1. When the parent node of the subtree is configured with quota, the quota 
> update is not synchronous and there will be a small delay.
> 2. Because the root directory always has the DirectoryWithQuotaFeature 
> attribute, we need to update the quotaUsage of the root directory anyway. In 
> addition, the root directory does not have an upper limit for quota 
> configuration. I think we can ignore the delayed update of quota for the root 
> directory.
> To solve the above problem, we can check whether all parent directories of 
> the subtree are configured with quota. If quota is not configured, use 
> asynchronous collection. We can also use configuration to let users decide 
> whether to enable quota checking.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to