[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

ASF GitHub Bot (Jira) Mon, 18 Sep 2023 02:32:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766298#comment-17766298
 ]


ASF GitHub Bot commented on HDFS-16000:
---------------------------------------

zhuxiangyi commented on code in PR #2964:
URL: https://github.com/apache/hadoop/pull/2964#discussion_r1328465474


##########
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java:
##########
@@ -1468,6 +1475,30 @@ static Collection<String> 
normalizePaths(Collection<String> paths,
     return normalized;
   }
 
+  /**
+   * Get the first Node that sets Quota.
+   */
+  static INode getFirstSetQuotaParentNode(INodesInPath iip) {
+    for (int i = iip.length() - 1; i > 0; i--) {
+      INode currNode = iip.getINode(i);
+      if (currNode == null) {

Review Comment:
   There should be no expected





> HDFS : Rename performance optimization
> --------------------------------------
>
>                 Key: HDFS-16000
>                 URL: https://issues.apache.org/jira/browse/HDFS-16000
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs, namenode
>    Affects Versions: 3.1.4, 3.3.1
>            Reporter: Xiangyi Zhu
>            Assignee: Xiangyi Zhu
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, 
> HDFS-16000.patch
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> It takes a long time to move a large directory with rename. For example, it 
> takes about 40 seconds to move a 1000W directory. When a large amount of data 
> is deleted to the trash, the move large directory will occur when the recycle 
> bin makes checkpoint. In addition, the user may also actively trigger the 
> move large directory operation, which will cause the NameNode to lock too 
> long and be killed by Zkfc. Through the flame graph, it is found that the 
> main time consuming is to create the EnumCounters object.
>  
> h3. Rename logic optimization:
>  * Regardless of whether the rename operation is the source directory and the 
> target directory, the quota count must be calculated three times. The first 
> time, check whether the moved directory exceeds the target directory quota, 
> the second time, calculate the mobile directory quota to update the source 
> directory quota, and the third time, calculate the mobile directory 
> configuration update to the target directory.
>  * I think some of the above three quota quota calculations are unnecessary. 
> For example, if all parent directories of the source directory and target 
> directory are not configured with quota, there is no need to calculate 
> quotaCount. Even if both the source directory and the target directory use 
> quota, there is no need to calculate the quota three times. The calculation 
> logic for the first and third times is the same, and it only needs to be 
> calculated once.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16000) HDFS : Rename performance optimization

Reply via email to