[ https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17800745#comment-17800745 ]
ASF GitHub Bot commented on HDFS-16000: --------------------------------------- lfxy commented on code in PR #2964: URL: https://github.com/apache/hadoop/pull/2964#discussion_r1436951488 ########## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirRenameOp.java: ########## @@ -470,17 +475,53 @@ static RenameResult unprotectedRenameTo(FSDirectory fsd, } } finally { if (undoRemoveSrc) { - tx.restoreSource(); + tx.restoreSource(srcStoragePolicyCounts); } if (undoRemoveDst) { // Rename failed - restore dst - tx.restoreDst(bsps); + tx.restoreDst(bsps, dstStoragePolicyCounts); } } NameNode.stateChangeLog.warn("DIR* FSDirectory.unprotectedRenameTo: " + "failed to rename " + src + " to " + dst); throw new IOException("rename from " + src + " to " + dst + " failed."); } + /* + * Calculate QuotaCounts based on parent directory and storage policy + * 1. If the storage policy of src and dst are different, + * calculate the QuotaCounts of src and dst respectively. + * 2. If all parent nodes of src and dst are not set with Quota, + * there is no need to calculate QuotaCount. + * 3. if parent nodes of src and dst have Quota configured, + * the QuotaCount is calculated once using the storage policy of src. + * */ + private static void computeQuotaCounts( + QuotaCounts srcStoragePolicyCounts, + QuotaCounts dstStoragePolicyCounts, + INodesInPath srcIIP, + INodesInPath dstIIP, + BlockStoragePolicySuite bsps, + RenameOperation tx) { + INode dstParent = dstIIP.getINode(-2); + INode srcParentNode = FSDirectory. + getFirstSetQuotaParentNode(srcIIP); + INode srcInode = srcIIP.getLastINode(); + INode dstParentNode = FSDirectory. + getFirstSetQuotaParentNode(dstIIP); + byte srcStoragePolicyID = FSDirectory.getStoragePolicyId(srcInode); + byte dstStoragePolicyID = FSDirectory.getStoragePolicyId(dstParent); + if (srcStoragePolicyID != dstStoragePolicyID) { + srcStoragePolicyCounts.add(srcIIP.getLastINode(). + computeQuotaUsage(bsps)); + dstStoragePolicyCounts.add(srcIIP.getLastINode() + .computeQuotaUsage(bsps, dstParent.getStoragePolicyID(), false, + Snapshot.CURRENT_STATE_ID)); + } else if (srcParentNode != dstParentNode || tx.withCount != null) { + srcStoragePolicyCounts.add(srcIIP.getLastINode().computeQuotaUsage(bsps)); + dstStoragePolicyCounts.add(srcStoragePolicyCounts); + } Review Comment: @Hexiaoqiao @zhuxiangyi Our production cluster encountered performance issues related to rename and needed to be optimized. In (srcStoragePolicyID == dstStoragePolicyID) condition, do we need to compute only when both conditions srcParentNode != dstParentNode and tx.isSrcInSnapshot == false are met? If it is right, can we use the below logics: if (srcStoragePolicyID != dstStoragePolicyID) { compute; } else if (srcParentNode != dstParentNode && !tx.isSrcInSnapshot) { compute; } Looking forward to your reply, thanks. > HDFS : Rename performance optimization > -------------------------------------- > > Key: HDFS-16000 > URL: https://issues.apache.org/jira/browse/HDFS-16000 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode > Affects Versions: 3.1.4, 3.3.1 > Reporter: Xiangyi Zhu > Assignee: Xiangyi Zhu > Priority: Major > Labels: pull-request-available > Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, > HDFS-16000.patch > > Time Spent: 50m > Remaining Estimate: 0h > > It takes a long time to move a large directory with rename. For example, it > takes about 40 seconds to move a 1000W directory. When a large amount of data > is deleted to the trash, the move large directory will occur when the recycle > bin makes checkpoint. In addition, the user may also actively trigger the > move large directory operation, which will cause the NameNode to lock too > long and be killed by Zkfc. Through the flame graph, it is found that the > main time consuming is to create the EnumCounters object. > > h3. Rename logic optimization: > * Regardless of whether the rename operation is the source directory and the > target directory, the quota count must be calculated three times. The first > time, check whether the moved directory exceeds the target directory quota, > the second time, calculate the mobile directory quota to update the source > directory quota, and the third time, calculate the mobile directory > configuration update to the target directory. > * I think some of the above three quota quota calculations are unnecessary. > For example, if all parent directories of the source directory and target > directory are not configured with quota, there is no need to calculate > quotaCount. Even if both the source directory and the target directory use > quota, there is no need to calculate the quota three times. The calculation > logic for the first and third times is the same, and it only needs to be > calculated once. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org