[ https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17766304#comment-17766304 ]
ASF GitHub Bot commented on HDFS-16000: --------------------------------------- zhuxiangyi commented on PR #2964: URL: https://github.com/apache/hadoop/pull/2964#issuecomment-1723082950 > Thanks @zhuxiangyi for your works. It is great idea and improvement. Almost LGTM. Leave some comments inline. Will give my +1 once correct. Thanks. @Hexiaoqiao Thank you very much for your reivew. I have fixed the problem and resubmitted the code. > HDFS : Rename performance optimization > -------------------------------------- > > Key: HDFS-16000 > URL: https://issues.apache.org/jira/browse/HDFS-16000 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode > Affects Versions: 3.1.4, 3.3.1 > Reporter: Xiangyi Zhu > Assignee: Xiangyi Zhu > Priority: Major > Labels: pull-request-available > Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, > HDFS-16000.patch > > Time Spent: 50m > Remaining Estimate: 0h > > It takes a long time to move a large directory with rename. For example, it > takes about 40 seconds to move a 1000W directory. When a large amount of data > is deleted to the trash, the move large directory will occur when the recycle > bin makes checkpoint. In addition, the user may also actively trigger the > move large directory operation, which will cause the NameNode to lock too > long and be killed by Zkfc. Through the flame graph, it is found that the > main time consuming is to create the EnumCounters object. > > h3. Rename logic optimization: > * Regardless of whether the rename operation is the source directory and the > target directory, the quota count must be calculated three times. The first > time, check whether the moved directory exceeds the target directory quota, > the second time, calculate the mobile directory quota to update the source > directory quota, and the third time, calculate the mobile directory > configuration update to the target directory. > * I think some of the above three quota quota calculations are unnecessary. > For example, if all parent directories of the source directory and target > directory are not configured with quota, there is no need to calculate > quotaCount. Even if both the source directory and the target directory use > quota, there is no need to calculate the quota three times. The calculation > logic for the first and third times is the same, and it only needs to be > calculated once. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org