[jira] [Updated] (HDFS-17191) HDFS: Delete operation adds a thread to collect blocks asynchronously
[ https://issues.apache.org/jira/browse/HDFS-17191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-17191: --- Description: When we delete a large directory, it is time-consuming to collect the blocks in the deleted subtree. Currently, block collection is executed within a write lock. If a large directory is deleted, other RPCs may be blocked for a period of time. Asynchronous deletion of collected blocks has been implemented, we can refer to this Jira https://issues.apache.org/jira/browse/HDFS-16043. In fact, collecting blocks does not require locking, because after the subtree is deleted, this subtree will not be accessed by other RPCs. We can collect the deleted subtree asynchronously and without locking. But there may be some problems: 1. When the parent node of the subtree is configured with quota, the quota update is not synchronous and there will be a small delay. 2. Because the root directory always has the DirectoryWithQuotaFeature attribute, we need to update the quotaUsage of the root directory anyway. In addition, the root directory does not have an upper limit for quota configuration. I think we can ignore the delayed update of quota for the root directory. To solve the above problem, we can check whether all parent directories of the subtree are configured with quota. If quota is not configured, use asynchronous collection. We can also use configuration to let users decide whether to enable quota checking. was: When we delete a large directory, it is time-consuming to collect the blocks in the deleted subtree. Currently, block collection is executed within a write lock. If a large directory is deleted, other RPCs may be blocked for a period of time. Asynchronous deletion of collected blocks has been implemented, we can refer to this. In fact, collecting blocks does not require locking, because after the subtree is deleted, this subtree will not be accessed by other RPCs. We can collect the deleted subtree asynchronously and without locking. But there may be some problems: 1. When the parent node of the subtree is configured with quota, the quota update is not synchronous and there will be a small delay. 2. Because the root directory always has the DirectoryWithQuotaFeature attribute, we need to update the quotaUsage of the root directory anyway. In addition, the root directory does not have an upper limit for quota configuration. I think we can ignore the delayed update of quota for the root directory. To solve the above problem, we can check whether all parent directories of the subtree are configured with quota. If quota is not configured, use asynchronous collection. We can also use configuration to let users decide whether to enable quota checking. > HDFS: Delete operation adds a thread to collect blocks asynchronously > - > > Key: HDFS-17191 > URL: https://issues.apache.org/jira/browse/HDFS-17191 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > > When we delete a large directory, it is time-consuming to collect the blocks > in the deleted subtree. Currently, block collection is executed within a > write lock. If a large directory is deleted, other RPCs may be blocked for a > period of time. Asynchronous deletion of collected blocks has been > implemented, we can refer to this Jira > https://issues.apache.org/jira/browse/HDFS-16043. > In fact, collecting blocks does not require locking, because after the > subtree is deleted, this subtree will not be accessed by other RPCs. We can > collect the deleted subtree asynchronously and without locking. > But there may be some problems: > 1. When the parent node of the subtree is configured with quota, the quota > update is not synchronous and there will be a small delay. > 2. Because the root directory always has the DirectoryWithQuotaFeature > attribute, we need to update the quotaUsage of the root directory anyway. In > addition, the root directory does not have an upper limit for quota > configuration. I think we can ignore the delayed update of quota for the root > directory. > To solve the above problem, we can check whether all parent directories of > the subtree are configured with quota. If quota is not configured, use > asynchronous collection. We can also use configuration to let users decide > whether to enable quota checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-17191) HDFS: Delete operation adds a thread to collect blocks asynchronously
Xiangyi Zhu created HDFS-17191: -- Summary: HDFS: Delete operation adds a thread to collect blocks asynchronously Key: HDFS-17191 URL: https://issues.apache.org/jira/browse/HDFS-17191 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.4.0 Reporter: Xiangyi Zhu Assignee: Xiangyi Zhu When we delete a large directory, it is time-consuming to collect the blocks in the deleted subtree. Currently, block collection is executed within a write lock. If a large directory is deleted, other RPCs may be blocked for a period of time. Asynchronous deletion of collected blocks has been implemented, we can refer to this. In fact, collecting blocks does not require locking, because after the subtree is deleted, this subtree will not be accessed by other RPCs. We can collect the deleted subtree asynchronously and without locking. But there may be some problems: 1. When the parent node of the subtree is configured with quota, the quota update is not synchronous and there will be a small delay. 2. Because the root directory always has the DirectoryWithQuotaFeature attribute, we need to update the quotaUsage of the root directory anyway. In addition, the root directory does not have an upper limit for quota configuration. I think we can ignore the delayed update of quota for the root directory. To solve the above problem, we can check whether all parent directories of the subtree are configured with quota. If quota is not configured, use asynchronous collection. We can also use configuration to let users decide whether to enable quota checking. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16000) HDFS : Rename performance optimization
[ https://issues.apache.org/jira/browse/HDFS-16000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16000: --- Description: It takes a long time to move a large directory with rename. For example, it takes about 40 seconds to move a 1000W directory. When a large amount of data is deleted to the trash, the move large directory will occur when the recycle bin makes checkpoint. In addition, the user may also actively trigger the move large directory operation, which will cause the NameNode to lock too long and be killed by Zkfc. Through the flame graph, it is found that the main time consuming is to create the EnumCounters object. h3. Rename logic optimization: * Regardless of whether the rename operation is the source directory and the target directory, the quota count must be calculated three times. The first time, check whether the moved directory exceeds the target directory quota, the second time, calculate the mobile directory quota to update the source directory quota, and the third time, calculate the mobile directory configuration update to the target directory. * I think some of the above three quota quota calculations are unnecessary. For example, if all parent directories of the source directory and target directory are not configured with quota, there is no need to calculate quotaCount. Even if both the source directory and the target directory use quota, there is no need to calculate the quota three times. The calculation logic for the first and third times is the same, and it only needs to be calculated once. was: It takes a long time to move a large directory with rename. For example, it takes about 40 seconds to move a 1000W directory. When a large amount of data is deleted to the trash, the move large directory will occur when the recycle bin makes checkpoint. In addition, the user may also actively trigger the move large directory operation, which will cause the NameNode to lock too long and be killed by Zkfc. Through the flame graph, it is found that the main time consuming is to create the EnumCounters object. h3. I think the following two points can optimize the efficiency of rename execution h3. QuotaCount calculation time-consuming optimization: * Create a QuotaCounts object in the calculation directory quotaCount, and pass the quotaCount to the next calculation function through a parameter each time, so as to avoid creating an EnumCounters object for each calculation. * In addition, through the flame graph, it is found that using lambda to modify QuotaCounts takes longer than the ordinary method, so the ordinary method is used to modify the QuotaCounts count. h3. Rename logic optimization: * Regardless of whether the rename operation is the source directory and the target directory, the quota count must be calculated three times. The first time, check whether the moved directory exceeds the target directory quota, the second time, calculate the mobile directory quota to update the source directory quota, and the third time, calculate the mobile directory configuration update to the target directory. * I think some of the above three quota quota calculations are unnecessary. For example, if all parent directories of the source directory and target directory are not configured with quota, there is no need to calculate quotaCount. Even if both the source directory and the target directory use quota, there is no need to calculate the quota three times. The calculation logic for the first and third times is the same, and it only needs to be calculated once. > HDFS : Rename performance optimization > -- > > Key: HDFS-16000 > URL: https://issues.apache.org/jira/browse/HDFS-16000 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namenode >Affects Versions: 3.1.4, 3.3.1 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Attachments: 20210428-143238.svg, 20210428-171635-lambda.svg, > HDFS-16000.patch > > Time Spent: 50m > Remaining Estimate: 0h > > It takes a long time to move a large directory with rename. For example, it > takes about 40 seconds to move a 1000W directory. When a large amount of data > is deleted to the trash, the move large directory will occur when the recycle > bin makes checkpoint. In addition, the user may also actively trigger the > move large directory operation, which will cause the NameNode to lock too > long and be killed by Zkfc. Through the flame graph, it is found that the > main time consuming is to create the EnumCounters object. > > h3. Rename logic optimization: > * Regardless of whether the rename operation is the source directory and the > target directory, the quota count must
[jira] [Commented] (HDFS-16214) Asynchronously collect blocks and update quota when deleting
[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17506225#comment-17506225 ] Xiangyi Zhu commented on HDFS-16214: [~hexiaoqiao] Thanks for your comment. {quote}A. when clean inode being deleted in the set. {quote} Adding inodes and removing inodes from the set are all operations within the lock, which is thread-safe. {quote}B. it seems not only `create file` should be considered, other operations such as renameTo also need to check, right? {quote} 1. When the /a/file file is deleted, the inode corresponding to this file is removed from children within the lock. Assuming that when collecting this file block, rename /a/file to /a/file1, since file is no longer in the children of the /a directory, rename will return false. 2.When deleting the /a/file file and in the block collection stage, at this time rename /dir to /dir1 At this time, the calculated quota does not include /a/file, when /a/file finishes collecting the block, it can find the parent directory according to the iip update quota. Its quota information is eventually consistent. {quote}C. what will happen if we delete parent inode during step 2? {quote} The inode of the block to be collected will be put into the queue, and then there will be a thread to collect the block alone. It can guarantee the order of collecting the block and update quota. If we delete the parent node in step 2, I think it is normal. {quote}D. do you mean the delete logic will be same at Active and Standby side for HA setup? If that, checkpoint has to wait every deletion complete to checkpoint? In some corn case, it will postpone checkpoint or if the checkpoint period will not under control? {quote} In addition, in order to avoid the situation that deleted files may still apply for blocks during multiple active/standby switchover, the active node waits for all deletions to be completed when it is converted into a standby node. > Asynchronously collect blocks and update quota when deleting > > > Key: HDFS-16214 > URL: https://issues.apache.org/jira/browse/HDFS-16214 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > The time-consuming deletion is mainly reflected in three logics , collecting > blocks, deleting Inode from InodeMap, and deleting blocks. The current > deletion is divided into two major steps. Step 1 acquires the lock, collects > the block and inode, deletes the inode, and releases the lock. Step 2 Acquire > the lock and delete the block to release the lock. > Phase 2 is currently deleting blocks in batches, which can control the lock > holding time. Here we can also delete blocks asynchronously. > Now step 1 still has the problem of holding the lock for a long time. > For stage 1, we can make the collection block not hold the lock. The process > is as follows, step 1 obtains the lock, parent.removeChild, writes to > editLog, releases the lock. Step 2 no lock, collects the block. Step 3 > acquire lock, update quota, release lease, release lock. Step 4 acquire lock, > delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block > to release lock. > There may be some problems following the above process: > 1. When the /a/b/c file is writing, then delete the /a/b directory. If the > deletion is performed to the collecting block stage, the client writes > complete or addBlock to the /a/b/c file at this time. This step is not locked > and delete /a/b and editLog has been written successfully. In this case, the > order of editLog is delete /a/c and complete /a/b/c. In this case, the > standby node playback editLog /a/b/c file has been deleted, and then go to > complete /a/b/c file will be abnormal. > *The process is as follows:* > *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* > *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c > {color:#ff}(not found){color}* > 2. If a delete operation is executed to the stage of collecting block, then > the administrator executes saveNameSpace, and then restarts Namenode. This > situation may cause the Inode that has been deleted from the parent childList > to remain in the InodeMap. > To solve the above problem, in step 1, add the inode being deleted to the > Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile > EditLog), check whether there is this file and one of its parent Inodes in > the Set, and throw it if there is. An exception FileNotFoundException > occurred. > In addition, the execution of
[jira] [Updated] (HDFS-16214) Asynchronously collect blocks and update quota when deleting
[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16214: --- Summary: Asynchronously collect blocks and update quota when deleting (was: Lock optimization for large deleteing, no locks on the collection block) > Asynchronously collect blocks and update quota when deleting > > > Key: HDFS-16214 > URL: https://issues.apache.org/jira/browse/HDFS-16214 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The time-consuming deletion is mainly reflected in three logics , collecting > blocks, deleting Inode from InodeMap, and deleting blocks. The current > deletion is divided into two major steps. Step 1 acquires the lock, collects > the block and inode, deletes the inode, and releases the lock. Step 2 Acquire > the lock and delete the block to release the lock. > Phase 2 is currently deleting blocks in batches, which can control the lock > holding time. Here we can also delete blocks asynchronously. > Now step 1 still has the problem of holding the lock for a long time. > For stage 1, we can make the collection block not hold the lock. The process > is as follows, step 1 obtains the lock, parent.removeChild, writes to > editLog, releases the lock. Step 2 no lock, collects the block. Step 3 > acquire lock, update quota, release lease, release lock. Step 4 acquire lock, > delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block > to release lock. > There may be some problems following the above process: > 1. When the /a/b/c file is writing, then delete the /a/b directory. If the > deletion is performed to the collecting block stage, the client writes > complete or addBlock to the /a/b/c file at this time. This step is not locked > and delete /a/b and editLog has been written successfully. In this case, the > order of editLog is delete /a/c and complete /a/b/c. In this case, the > standby node playback editLog /a/b/c file has been deleted, and then go to > complete /a/b/c file will be abnormal. > *The process is as follows:* > *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* > *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c > {color:#ff}(not found){color}* > 2. If a delete operation is executed to the stage of collecting block, then > the administrator executes saveNameSpace, and then restarts Namenode. This > situation may cause the Inode that has been deleted from the parent childList > to remain in the InodeMap. > To solve the above problem, in step 1, add the inode being deleted to the > Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile > EditLog), check whether there is this file and one of its parent Inodes in > the Set, and throw it if there is. An exception FileNotFoundException > occurred. > In addition, the execution of saveNamespace needs to wait for all iNodes in > Set to be removed before execution. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block
[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17478544#comment-17478544 ] Xiangyi Zhu commented on HDFS-16214: [~John Smith] Currently Issues wants to solve the problem of long lock-holding time when collecting blocks when deleting large directories. This [HDFS-16043|https://issues.apache.org/jira/browse/HDFS-16043] Issuss is to achieve asynchronous deletion of blocks. These two issues are not the same. > Lock optimization for large deleteing, no locks on the collection block > --- > > Key: HDFS-16214 > URL: https://issues.apache.org/jira/browse/HDFS-16214 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The time-consuming deletion is mainly reflected in three logics , collecting > blocks, deleting Inode from InodeMap, and deleting blocks. The current > deletion is divided into two major steps. Step 1 acquires the lock, collects > the block and inode, deletes the inode, and releases the lock. Step 2 Acquire > the lock and delete the block to release the lock. > Phase 2 is currently deleting blocks in batches, which can control the lock > holding time. Here we can also delete blocks asynchronously. > Now step 1 still has the problem of holding the lock for a long time. > For stage 1, we can make the collection block not hold the lock. The process > is as follows, step 1 obtains the lock, parent.removeChild, writes to > editLog, releases the lock. Step 2 no lock, collects the block. Step 3 > acquire lock, update quota, release lease, release lock. Step 4 acquire lock, > delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block > to release lock. > There may be some problems following the above process: > 1. When the /a/b/c file is writing, then delete the /a/b directory. If the > deletion is performed to the collecting block stage, the client writes > complete or addBlock to the /a/b/c file at this time. This step is not locked > and delete /a/b and editLog has been written successfully. In this case, the > order of editLog is delete /a/c and complete /a/b/c. In this case, the > standby node playback editLog /a/b/c file has been deleted, and then go to > complete /a/b/c file will be abnormal. > *The process is as follows:* > *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* > *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c > {color:#ff}(not found){color}* > 2. If a delete operation is executed to the stage of collecting block, then > the administrator executes saveNameSpace, and then restarts Namenode. This > situation may cause the Inode that has been deleted from the parent childList > to remain in the InodeMap. > To solve the above problem, in step 1, add the inode being deleted to the > Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile > EditLog), check whether there is this file and one of its parent Inodes in > the Set, and throw it if there is. An exception FileNotFoundException > occurred. > In addition, the execution of saveNamespace needs to wait for all iNodes in > Set to be removed before execution. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16043: --- Description: Add markedDeleteBlockScrubberThread to delete blocks asynchronously. (was: The deletion of the large directory caused NN to hold the lock for too long, which caused our NameNode to be killed by ZKFC. Through the flame graph, it is found that its main time-consuming calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. h3. solution: 1. RemoveBlocks is processed asynchronously. A thread is started in the BlockManager to process the deleted blocks and control the lock time. 2. QuotaCount calculation optimization, this is similar to the optimization of this Issue HDFS-16000. h3. Comparison before and after optimization: Delete 1000w Inode and 1000w block test. *before:* remove inode elapsed time: 7691 ms remove block elapsed time :11107 ms *after:* remove inode elapsed time: 4149 ms remove block elapsed time :0 ms) > Add markedDeleteBlockScrubberThread to delete blocks asynchronously > --- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.2 > > Attachments: 20210527-after.svg, 20210527-before.svg > > Time Spent: 12.5h > Remaining Estimate: 0h > > Add markedDeleteBlockScrubberThread to delete blocks asynchronously. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block
[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17476944#comment-17476944 ] Xiangyi Zhu commented on HDFS-16214: [~hexiaoqiao] [~weichiu] [~sodonnell] If the collection block is not locked or processed asynchronously, the quota update will not be real-time accurate, but it will eventually be consistent. I think it is acceptable to sacrifice the real-time accuracy of quota for performance improvement.look forward to your reply. > Lock optimization for large deleteing, no locks on the collection block > --- > > Key: HDFS-16214 > URL: https://issues.apache.org/jira/browse/HDFS-16214 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The time-consuming deletion is mainly reflected in three logics , collecting > blocks, deleting Inode from InodeMap, and deleting blocks. The current > deletion is divided into two major steps. Step 1 acquires the lock, collects > the block and inode, deletes the inode, and releases the lock. Step 2 Acquire > the lock and delete the block to release the lock. > Phase 2 is currently deleting blocks in batches, which can control the lock > holding time. Here we can also delete blocks asynchronously. > Now step 1 still has the problem of holding the lock for a long time. > For stage 1, we can make the collection block not hold the lock. The process > is as follows, step 1 obtains the lock, parent.removeChild, writes to > editLog, releases the lock. Step 2 no lock, collects the block. Step 3 > acquire lock, update quota, release lease, release lock. Step 4 acquire lock, > delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block > to release lock. > There may be some problems following the above process: > 1. When the /a/b/c file is writing, then delete the /a/b directory. If the > deletion is performed to the collecting block stage, the client writes > complete or addBlock to the /a/b/c file at this time. This step is not locked > and delete /a/b and editLog has been written successfully. In this case, the > order of editLog is delete /a/c and complete /a/b/c. In this case, the > standby node playback editLog /a/b/c file has been deleted, and then go to > complete /a/b/c file will be abnormal. > *The process is as follows:* > *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* > *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c > {color:#ff}(not found){color}* > 2. If a delete operation is executed to the stage of collecting block, then > the administrator executes saveNameSpace, and then restarts Namenode. This > situation may cause the Inode that has been deleted from the parent childList > to remain in the InodeMap. > To solve the above problem, in step 1, add the inode being deleted to the > Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile > EditLog), check whether there is this file and one of its parent Inodes in > the Set, and throw it if there is. An exception FileNotFoundException > occurred. > In addition, the execution of saveNamespace needs to wait for all iNodes in > Set to be removed before execution. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block
[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16214: --- Description: The time-consuming deletion is mainly reflected in three logics , collecting blocks, deleting Inode from InodeMap, and deleting blocks. The current deletion is divided into two major steps. Step 1 acquires the lock, collects the block and inode, deletes the inode, and releases the lock. Step 2 Acquire the lock and delete the block to release the lock. Phase 2 is currently deleting blocks in batches, which can control the lock holding time. Here we can also delete blocks asynchronously. Now step 1 still has the problem of holding the lock for a long time. For stage 1, we can make the collection block not hold the lock. The process is as follows, step 1 obtains the lock, parent.removeChild, writes to editLog, releases the lock. Step 2 no lock, collects the block. Step 3 acquire lock, update quota, release lease, release lock. Step 4 acquire lock, delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block to release lock. There may be some problems following the above process: 1. When the /a/b/c file is writing, then delete the /a/b directory. If the deletion is performed to the collecting block stage, the client writes complete or addBlock to the /a/b/c file at this time. This step is not locked and delete /a/b and editLog has been written successfully. In this case, the order of editLog is delete /a/c and complete /a/b/c. In this case, the standby node playback editLog /a/b/c file has been deleted, and then go to complete /a/b/c file will be abnormal. *The process is as follows:* *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c {color:#ff}(not found){color}* 2. If a delete operation is executed to the stage of collecting block, then the administrator executes saveNameSpace, and then restarts Namenode. This situation may cause the Inode that has been deleted from the parent childList to remain in the InodeMap. To solve the above problem, in step 1, add the inode being deleted to the Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile EditLog), check whether there is this file and one of its parent Inodes in the Set, and throw it if there is. An exception FileNotFoundException occurred. In addition, the execution of saveNamespace needs to wait for all iNodes in Set to be removed before execution. was: The time-consuming deletion is mainly reflected in three logics , collecting blocks, deleting Inode from InodeMap, and deleting blocks. The current deletion is divided into two major steps. Step 1 acquires the lock, collects the block and inode, deletes the inode, and releases the lock. Step 2 Acquire the lock and delete the block to release the lock. Phase 2 is currently deleting blocks in batches, which can control the lock holding time. Here we can also delete blocks asynchronously. Now step 1 still has the problem of holding the lock for a long time. For stage 1, we can make the collection block not hold the lock. The process is as follows, step 1 obtains the lock, parent.removeChild, writes to editLog, releases the lock. Step 2 no lock, collects the block. Step 3 acquire lock, update quota, release lease, release lock. Step 4 acquire lock, delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block to release lock. There may be some problems following the above process: 1. When the /a/b/c file is writing, then delete the /a/b directory. If the deletion is performed to the collecting block stage, the client writes complete or addBlock to the /a/b/c file at this time. This step is not locked and delete /a/b and editLog has been written successfully. In this case, the order of editLog is delete /a/c and complete /a/b/c. In this case, the standby node playback editLog /a/b/c file has been deleted, and then go to complete /a/b/c file will be abnormal. *The process is as follows:* *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* {*}complete /a/b/c {color:#FF}(not found){color}{*}{*}{*}{*}{*}{*}{*} 2. If a delete operation is executed to the stage of collecting block, then the administrator executes saveNameSpace, and then restarts Namenode. This situation may cause the Inode that has been deleted from the parent childList to remain in the InodeMap. To solve the above problem, in step 1, add the inode being deleted to the Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile EditLog), check whether there is this file and one of its parent Inodes in the Set, and throw it if there is. An exception FileNotFoundException occurred. In addition, the execution of saveNamespace needs to wait for
[jira] [Updated] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block
[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16214: --- Description: The time-consuming deletion is mainly reflected in three logics , collecting blocks, deleting Inode from InodeMap, and deleting blocks. The current deletion is divided into two major steps. Step 1 acquires the lock, collects the block and inode, deletes the inode, and releases the lock. Step 2 Acquire the lock and delete the block to release the lock. Phase 2 is currently deleting blocks in batches, which can control the lock holding time. Here we can also delete blocks asynchronously. Now step 1 still has the problem of holding the lock for a long time. For stage 1, we can make the collection block not hold the lock. The process is as follows, step 1 obtains the lock, parent.removeChild, writes to editLog, releases the lock. Step 2 no lock, collects the block. Step 3 acquire lock, update quota, release lease, release lock. Step 4 acquire lock, delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block to release lock. There may be some problems following the above process: 1. When the /a/b/c file is writing, then delete the /a/b directory. If the deletion is performed to the collecting block stage, the client writes complete or addBlock to the /a/b/c file at this time. This step is not locked and delete /a/b and editLog has been written successfully. In this case, the order of editLog is delete /a/c and complete /a/b/c. In this case, the standby node playback editLog /a/b/c file has been deleted, and then go to complete /a/b/c file will be abnormal. *The process is as follows:* *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* {*}complete /a/b/c {color:#FF}(not found){color}{*}{*}{*}{*}{*}{*}{*} 2. If a delete operation is executed to the stage of collecting block, then the administrator executes saveNameSpace, and then restarts Namenode. This situation may cause the Inode that has been deleted from the parent childList to remain in the InodeMap. To solve the above problem, in step 1, add the inode being deleted to the Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile EditLog), check whether there is this file and one of its parent Inodes in the Set, and throw it if there is. An exception FileNotFoundException occurred. In addition, the execution of saveNamespace needs to wait for all iNodes in Set to be removed before execution. was: The time-consuming deletion is mainly reflected in three logics , collecting blocks, deleting Inode from InodeMap, and deleting blocks. The current deletion is divided into two major steps. Step 1 acquires the lock, collects the block and inode, deletes the inode, and releases the lock. Step 2 Acquire the lock and delete the block to release the lock. Phase 2 is currently deleting blocks in batches, which can control the lock holding time. Here we can also delete blocks asynchronously. Now step 1 still has the problem of holding the lock for a long time. For stage 1, we can make the collection block not hold the lock. The process is as follows, step 1 obtains the lock, parent.removeChild, writes to editLog, releases the lock. Step 2 no lock, collects the block. Step 3 acquire lock, update quota, release lease, release lock. Step 4 acquire lock, delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block to release lock. There may be some problems following the above process: 1. When the /a/b/c file is writing, then delete the /a/b directory. If the deletion is performed to the collecting block stage, the client writes complete or addBlock to the /a/b/c file at this time. This step is not locked and delete /a/b and editLog has been written successfully. In this case, the order of editLog is delete /a/c and complete /a/b/c. In this case, the standby node playback editLog /a/b/c file has been deleted, and then go to complete /a/b/c file will be abnormal. 2. If a delete operation is executed to the stage of collecting block, then the administrator executes saveNameSpace, and then restarts Namenode. This situation may cause the Inode that has been deleted from the parent childList to remain in the InodeMap. To solve the above problem, in step 1, add the inode being deleted to the Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile EditLog), check whether there is this file and one of its parent Inodes in the Set, and throw it if there is. An exception FileNotFoundException occurred. In addition, the execution of saveNamespace needs to wait for all iNodes in Set to be removed before execution. > Lock optimization for large deleteing, no locks on the collection block > --- > >
[jira] [Updated] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block
[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16214: --- Description: The time-consuming deletion is mainly reflected in three logics , collecting blocks, deleting Inode from InodeMap, and deleting blocks. The current deletion is divided into two major steps. Step 1 acquires the lock, collects the block and inode, deletes the inode, and releases the lock. Step 2 Acquire the lock and delete the block to release the lock. Phase 2 is currently deleting blocks in batches, which can control the lock holding time. Here we can also delete blocks asynchronously. Now step 1 still has the problem of holding the lock for a long time. For stage 1, we can make the collection block not hold the lock. The process is as follows, step 1 obtains the lock, parent.removeChild, writes to editLog, releases the lock. Step 2 no lock, collects the block. Step 3 acquire lock, update quota, release lease, release lock. Step 4 acquire lock, delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block to release lock. There may be some problems following the above process: 1. When the /a/b/c file is writing, then delete the /a/b directory. If the deletion is performed to the collecting block stage, the client writes complete or addBlock to the /a/b/c file at this time. This step is not locked and delete /a/b and editLog has been written successfully. In this case, the order of editLog is delete /a/c and complete /a/b/c. In this case, the standby node playback editLog /a/b/c file has been deleted, and then go to complete /a/b/c file will be abnormal. 2. If a delete operation is executed to the stage of collecting block, then the administrator executes saveNameSpace, and then restarts Namenode. This situation may cause the Inode that has been deleted from the parent childList to remain in the InodeMap. To solve the above problem, in step 1, add the inode being deleted to the Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile EditLog), check whether there is this file and one of its parent Inodes in the Set, and throw it if there is. An exception FileNotFoundException occurred. In addition, the execution of saveNamespace needs to wait for all iNodes in Set to be removed before execution. was: The time-consuming deletion is mainly reflected in three logics , collecting blocks, deleting Inode from InodeMap, and deleting blocks. The current deletion is divided into two major steps. Step 1 acquires the lock, collects the block and inode, deletes the inode, and releases the lock. Step 2 Acquire the lock and delete the block to release the lock. Phase 2 is currently deleting blocks in batches, which can control the lock holding time. Here we can also delete blocks asynchronously. Now step 1 still has the problem of holding the lock for a long time. For stage 1, we can make the collection block not hold the lock. The process is as follows, step 1 obtains the lock, parent.removeChild, writes to editLog, releases the lock. Step 2 no lock, collects the block. Step 3 acquire lock, update quota, release lease, release lock. Step 4 acquire lock, delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block to release lock. There may be some problems following the above process: 1. When the /a/b/c file is open, then delete the /a/b directory. If the deletion is performed to the collecting block stage, the client writes complete or addBlock to the /a/b/c file at this time. This step is not locked and delete /a/b and editLog has been written successfully. In this case, the order of editLog is delete /a/c and complete /a/b/c. In this case, the standby node playback editLog /a/b/c file has been deleted, and then go to complete /a/b/c file will be abnormal. 2. If a delete operation is executed to the stage of collecting block, then the administrator executes saveNameSpace, and then restarts Namenode. This situation may cause the Inode that has been deleted from the parent childList to remain in the InodeMap. To solve the above problem, in step 1, add the inode being deleted to the Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile EditLog), check whether there is this file and one of its parent Inodes in the Set, and throw it if there is. An exception FileNotFoundException occurred. In addition, the execution of saveNamespace needs to wait for all iNodes in Set to be removed before execution. > Lock optimization for large deleteing, no locks on the collection block > --- > > Key: HDFS-16214 > URL: https://issues.apache.org/jira/browse/HDFS-16214 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >
[jira] [Updated] (HDFS-16421) Remove RouterRpcFairnessPolicyController ConcurrentNS to avoid renewLease being unavailable
[ https://issues.apache.org/jira/browse/HDFS-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16421: --- Summary: Remove RouterRpcFairnessPolicyController ConcurrentNS to avoid renewLease being unavailable (was: RouterRpcFairnessPolicyController remove ConcurrentNS ) > Remove RouterRpcFairnessPolicyController ConcurrentNS to avoid renewLease > being unavailable > --- > > Key: HDFS-16421 > URL: https://issues.apache.org/jira/browse/HDFS-16421 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > > When using the RouterRpcFairnessConstants strategy, if the NamNode rpc is > slow or does not respond, it is easy to use up the concurrent available > handlers, and the client will not be able to renewLease normally. > I think CONCURRENT_NS can be removed. When there is an rpc of CONCURRENT, we > traverse each NS to apply for the corresponding Handler, instead of just > applying for one Handler like CONCURRENT. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16421) RouterRpcFairnessPolicyController remove ConcurrentNS
[ https://issues.apache.org/jira/browse/HDFS-16421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16421: --- Summary: RouterRpcFairnessPolicyController remove ConcurrentNS (was: RouterRpcFairnessConstants remove ConcurrentNS ) > RouterRpcFairnessPolicyController remove ConcurrentNS > -- > > Key: HDFS-16421 > URL: https://issues.apache.org/jira/browse/HDFS-16421 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > > When using the RouterRpcFairnessConstants strategy, if the NamNode rpc is > slow or does not respond, it is easy to use up the concurrent available > handlers, and the client will not be able to renewLease normally. > I think CONCURRENT_NS can be removed. When there is an rpc of CONCURRENT, we > traverse each NS to apply for the corresponding Handler, instead of just > applying for one Handler like CONCURRENT. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16421) RouterRpcFairnessConstants remove ConcurrentNS
Xiangyi Zhu created HDFS-16421: -- Summary: RouterRpcFairnessConstants remove ConcurrentNS Key: HDFS-16421 URL: https://issues.apache.org/jira/browse/HDFS-16421 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Affects Versions: 3.4.0 Reporter: Xiangyi Zhu Assignee: Xiangyi Zhu When using the RouterRpcFairnessConstants strategy, if the NamNode rpc is slow or does not respond, it is easy to use up the concurrent available handlers, and the client will not be able to renewLease normally. I think CONCURRENT_NS can be removed. When there is an rpc of CONCURRENT, we traverse each NS to apply for the corresponding Handler, instead of just applying for one Handler like CONCURRENT. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16019) HDFS: Inode CheckPoint
[ https://issues.apache.org/jira/browse/HDFS-16019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu resolved HDFS-16019. Resolution: Later > HDFS: Inode CheckPoint > --- > > Key: HDFS-16019 > URL: https://issues.apache.org/jira/browse/HDFS-16019 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namanode >Affects Versions: 3.3.1 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > > *background* > The OIV IMAGE analysis tool has brought us many benefits, such as file size > distribution, cold and hot data, abnormal growth directory analysis. But in > my opinion he is too slow, especially the big IMAGE. > After Hadoop 2.3, the format of IMAGE has changed. For OIV tools, it is > necessary to load the entire IMAGE into the memory to output the inode > information into a text format. For large IMAGE, this process takes a long > time and consumes more resources and requires a large memory machine to > analyze. > Although, HDFS provides the dfs.namenode.legacy-oiv-image.dir parameter to > get the old version of IMAGE through CheckPoint. The old IMAGE parsing does > not require too many resources, but we need to parse the IMAGE again through > the hdfs oiv_legacy command to get the text information of the Inode, which > is relatively time-consuming. > ** > *Solution* > We can ask the standby node to periodically check the Inode and serialize the > Inode in text mode. For OutPut, different FileSystems can be used according > to the configuration, such as the local file system or the HDFS file system. > The advantage of providing HDFS file system is that we can analyze Inode > directly through spark/hive. I think the block information corresponding to > the Inode may not be of much use. The size of the file and the number of > copies are more useful to us. > In addition, the sequential output of the Inode is not necessary. We can > speed up the CheckPoint for the Inode, and use the partition for the > serialized Inode to output different files. Use a production thread to put > Inode in the Queue, and use multi-threaded consumption Queue to write to > different partition files. For output files, compression can also be used to > reduce disk IO. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470561#comment-17470561 ] Xiangyi Zhu commented on HDFS-16043: [~mofei] I think even if the collection of blocks is made asynchronous, if he still holds the lock, there will still be problems with the big lock. HDFS-16214 The idea here is to keep the collection block from holding the lock. I will submit the code of this problem next week. > Add markedDeleteBlockScrubberThread to delete blocks asynchronously > --- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Attachments: 20210527-after.svg, 20210527-before.svg > > Time Spent: 6h 50m > Remaining Estimate: 0h > > The deletion of the large directory caused NN to hold the lock for too long, > which caused our NameNode to be killed by ZKFC. > Through the flame graph, it is found that its main time-consuming > calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting > inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. > h3. solution: > 1. RemoveBlocks is processed asynchronously. A thread is started in the > BlockManager to process the deleted blocks and control the lock time. > 2. QuotaCount calculation optimization, this is similar to the optimization > of this Issue HDFS-16000. > h3. Comparison before and after optimization: > Delete 1000w Inode and 1000w block test. > *before:* > remove inode elapsed time: 7691 ms > remove block elapsed time :11107 ms > *after:* > remove inode elapsed time: 4149 ms > remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17470363#comment-17470363 ] Xiangyi Zhu commented on HDFS-16043: [~mofei] Thank a lot for your feedback. > Add markedDeleteBlockScrubberThread to delete blocks asynchronously > --- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Attachments: 20210527-after.svg, 20210527-before.svg > > Time Spent: 5h 50m > Remaining Estimate: 0h > > The deletion of the large directory caused NN to hold the lock for too long, > which caused our NameNode to be killed by ZKFC. > Through the flame graph, it is found that its main time-consuming > calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting > inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. > h3. solution: > 1. RemoveBlocks is processed asynchronously. A thread is started in the > BlockManager to process the deleted blocks and control the lock time. > 2. QuotaCount calculation optimization, this is similar to the optimization > of this Issue HDFS-16000. > h3. Comparison before and after optimization: > Delete 1000w Inode and 1000w block test. > *before:* > remove inode elapsed time: 7691 ms > remove block elapsed time :11107 ms > *after:* > remove inode elapsed time: 4149 ms > remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16412) Add metrics to support obtaining file size distribution
[ https://issues.apache.org/jira/browse/HDFS-16412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16412: --- Description: Use RangeMapRange "Map fileSizeRange" to store counters at different intervals. RangeMap key is a specific interval, and value is the counter corresponding to the interval. *Counter update:* When the file size changes or the file is deleted, the file size is obtained, and the counter in the corresponding interval is called to update the counter. *Interval division:* The default is to initialize the startup according to the following interval, or it can be initialized through the configuration file. 0MB 0-16MB 16-32MB 32-64MB 64-128MB 128-256MB 256-512MB >512MB was: Use RangeMapRange "Map fileSizeRange" to store counters at different intervals. RangeMap key is a specific interval, and value is the counter corresponding to the interval. ** *Counter update:* When the file size changes or the file is deleted, the file size is obtained, and the counter in the corresponding interval is called to update the counter. ** *Interval division:* The default is to initialize the startup according to the following interval, or it can be initialized through the configuration file. 0MB 0-16MB 16-32MB 32-64MB 64-128MB 128-256MB 256-512MB >512MB > Add metrics to support obtaining file size distribution > --- > > Key: HDFS-16412 > URL: https://issues.apache.org/jira/browse/HDFS-16412 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Minor > > Use RangeMapRange "Map fileSizeRange" to store counters at > different intervals. RangeMap key is a specific interval, and value is the > counter corresponding to the interval. > *Counter update:* > When the file size changes or the file is deleted, the file size is obtained, > and the counter in the corresponding interval is called to update the counter. > *Interval division:* > The default is to initialize the startup according to the following interval, > or it can be initialized through the configuration file. > 0MB > 0-16MB > 16-32MB > 32-64MB > 64-128MB > 128-256MB > 256-512MB > >512MB -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16412) Add metrics to support obtaining file size distribution
Xiangyi Zhu created HDFS-16412: -- Summary: Add metrics to support obtaining file size distribution Key: HDFS-16412 URL: https://issues.apache.org/jira/browse/HDFS-16412 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.4.0 Reporter: Xiangyi Zhu Assignee: Xiangyi Zhu Use RangeMapRange "Map fileSizeRange" to store counters at different intervals. RangeMap key is a specific interval, and value is the counter corresponding to the interval. ** *Counter update:* When the file size changes or the file is deleted, the file size is obtained, and the counter in the corresponding interval is called to update the counter. ** *Interval division:* The default is to initialize the startup according to the following interval, or it can be initialized through the configuration file. 0MB 0-16MB 16-32MB 32-64MB 64-128MB 128-256MB 256-512MB >512MB -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16276) RBF: Remove the useless configuration of rpc isolation in md
[ https://issues.apache.org/jira/browse/HDFS-16276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu reassigned HDFS-16276: -- Assignee: Xiangyi Zhu > RBF: Remove the useless configuration of rpc isolation in md > - > > Key: HDFS-16276 > URL: https://issues.apache.org/jira/browse/HDFS-16276 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > > The *dfs.federation.router.fairness.enable* configuration is not used in the > code, but there is it in md, we should delete it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16276) RBF: Remove the useless configuration of rpc isolation in md
Xiangyi Zhu created HDFS-16276: -- Summary: RBF: Remove the useless configuration of rpc isolation in md Key: HDFS-16276 URL: https://issues.apache.org/jira/browse/HDFS-16276 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Affects Versions: 3.4.0 Reporter: Xiangyi Zhu The *dfs.federation.router.fairness.enable* configuration is not used in the code, but there is it in md, we should delete it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16273) RBF: RouterRpcFairnessPolicyController add availableHandleOnPerNs metrics
Xiangyi Zhu created HDFS-16273: -- Summary: RBF: RouterRpcFairnessPolicyController add availableHandleOnPerNs metrics Key: HDFS-16273 URL: https://issues.apache.org/jira/browse/HDFS-16273 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Affects Versions: 3.4.0 Reporter: Xiangyi Zhu Add the availableHandlerOnPerNs metrics to monitor whether the number of handlers configured for each NS is reasonable when using RouterRpcFairnessPolicyController. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16273) RBF: RouterRpcFairnessPolicyController add availableHandleOnPerNs metrics
[ https://issues.apache.org/jira/browse/HDFS-16273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu reassigned HDFS-16273: -- Assignee: Xiangyi Zhu > RBF: RouterRpcFairnessPolicyController add availableHandleOnPerNs metrics > - > > Key: HDFS-16273 > URL: https://issues.apache.org/jira/browse/HDFS-16273 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > > Add the availableHandlerOnPerNs metrics to monitor whether the number of > handlers configured for each NS is reasonable when using > RouterRpcFairnessPolicyController. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16043) Add markedDeleteBlockScrubberThread to delete blocks asynchronously
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16043: --- Summary: Add markedDeleteBlockScrubberThread to delete blocks asynchronously (was: HDFS : Add markedDeleteBlockScrubberThread to delete blcoks asynchronously) > Add markedDeleteBlockScrubberThread to delete blocks asynchronously > --- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Attachments: 20210527-after.svg, 20210527-before.svg > > Time Spent: 5h 10m > Remaining Estimate: 0h > > The deletion of the large directory caused NN to hold the lock for too long, > which caused our NameNode to be killed by ZKFC. > Through the flame graph, it is found that its main time-consuming > calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting > inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. > h3. solution: > 1. RemoveBlocks is processed asynchronously. A thread is started in the > BlockManager to process the deleted blocks and control the lock time. > 2. QuotaCount calculation optimization, this is similar to the optimization > of this Issue HDFS-16000. > h3. Comparison before and after optimization: > Delete 1000w Inode and 1000w block test. > *before:* > remove inode elapsed time: 7691 ms > remove block elapsed time :11107 ms > *after:* > remove inode elapsed time: 4149 ms > remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16043) HDFS : Add markedDeleteBlockScrubberThread to delete blcoks asynchronously
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16043: --- Summary: HDFS : Add markedDeleteBlockScrubberThread to delete blcoks asynchronously (was: HDFS : Delete performance optimization) > HDFS : Add markedDeleteBlockScrubberThread to delete blcoks asynchronously > -- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Attachments: 20210527-after.svg, 20210527-before.svg > > Time Spent: 5h > Remaining Estimate: 0h > > The deletion of the large directory caused NN to hold the lock for too long, > which caused our NameNode to be killed by ZKFC. > Through the flame graph, it is found that its main time-consuming > calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting > inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. > h3. solution: > 1. RemoveBlocks is processed asynchronously. A thread is started in the > BlockManager to process the deleted blocks and control the lock time. > 2. QuotaCount calculation optimization, this is similar to the optimization > of this Issue HDFS-16000. > h3. Comparison before and after optimization: > Delete 1000w Inode and 1000w block test. > *before:* > remove inode elapsed time: 7691 ms > remove block elapsed time :11107 ms > *after:* > remove inode elapsed time: 4149 ms > remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block
Xiangyi Zhu created HDFS-16214: -- Summary: Lock optimization for large deleteing, no locks on the collection block Key: HDFS-16214 URL: https://issues.apache.org/jira/browse/HDFS-16214 Project: Hadoop HDFS Issue Type: Improvement Components: namenode Affects Versions: 3.4.0 Reporter: Xiangyi Zhu The time-consuming deletion is mainly reflected in three logics , collecting blocks, deleting Inode from InodeMap, and deleting blocks. The current deletion is divided into two major steps. Step 1 acquires the lock, collects the block and inode, deletes the inode, and releases the lock. Step 2 Acquire the lock and delete the block to release the lock. Phase 2 is currently deleting blocks in batches, which can control the lock holding time. Here we can also delete blocks asynchronously. Now step 1 still has the problem of holding the lock for a long time. For stage 1, we can make the collection block not hold the lock. The process is as follows, step 1 obtains the lock, parent.removeChild, writes to editLog, releases the lock. Step 2 no lock, collects the block. Step 3 acquire lock, update quota, release lease, release lock. Step 4 acquire lock, delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block to release lock. There may be some problems following the above process: 1. When the /a/b/c file is open, then delete the /a/b directory. If the deletion is performed to the collecting block stage, the client writes complete or addBlock to the /a/b/c file at this time. This step is not locked and delete /a/b and editLog has been written successfully. In this case, the order of editLog is delete /a/c and complete /a/b/c. In this case, the standby node playback editLog /a/b/c file has been deleted, and then go to complete /a/b/c file will be abnormal. 2. If a delete operation is executed to the stage of collecting block, then the administrator executes saveNameSpace, and then restarts Namenode. This situation may cause the Inode that has been deleted from the parent childList to remain in the InodeMap. To solve the above problem, in step 1, add the inode being deleted to the Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile EditLog), check whether there is this file and one of its parent Inodes in the Set, and throw it if there is. An exception FileNotFoundException occurred. In addition, the execution of saveNamespace needs to wait for all iNodes in Set to be removed before execution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16214) Lock optimization for large deleteing, no locks on the collection block
[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu reassigned HDFS-16214: -- Assignee: Xiangyi Zhu > Lock optimization for large deleteing, no locks on the collection block > --- > > Key: HDFS-16214 > URL: https://issues.apache.org/jira/browse/HDFS-16214 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > > The time-consuming deletion is mainly reflected in three logics , collecting > blocks, deleting Inode from InodeMap, and deleting blocks. The current > deletion is divided into two major steps. Step 1 acquires the lock, collects > the block and inode, deletes the inode, and releases the lock. Step 2 Acquire > the lock and delete the block to release the lock. > Phase 2 is currently deleting blocks in batches, which can control the lock > holding time. Here we can also delete blocks asynchronously. > Now step 1 still has the problem of holding the lock for a long time. > For stage 1, we can make the collection block not hold the lock. The process > is as follows, step 1 obtains the lock, parent.removeChild, writes to > editLog, releases the lock. Step 2 no lock, collects the block. Step 3 > acquire lock, update quota, release lease, release lock. Step 4 acquire lock, > delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block > to release lock. > There may be some problems following the above process: > 1. When the /a/b/c file is open, then delete the /a/b directory. If the > deletion is performed to the collecting block stage, the client writes > complete or addBlock to the /a/b/c file at this time. This step is not locked > and delete /a/b and editLog has been written successfully. In this case, the > order of editLog is delete /a/c and complete /a/b/c. In this case, the > standby node playback editLog /a/b/c file has been deleted, and then go to > complete /a/b/c file will be abnormal. > 2. If a delete operation is executed to the stage of collecting block, then > the administrator executes saveNameSpace, and then restarts Namenode. This > situation may cause the Inode that has been deleted from the parent childList > to remain in the InodeMap. > To solve the above problem, in step 1, add the inode being deleted to the > Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile > EditLog), check whether there is this file and one of its parent Inodes in > the Set, and throw it if there is. An exception FileNotFoundException > occurred. > In addition, the execution of saveNamespace needs to wait for all iNodes in > Set to be removed before execution. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16095) Add lsQuotaList command and getQuotaListing api for hdfs quota
[ https://issues.apache.org/jira/browse/HDFS-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17375286#comment-17375286 ] Xiangyi Zhu commented on HDFS-16095: [~weichiu],[~ayushtkn],[~hexiaoqiao],[~kihwal] Looking forward to your comments. > Add lsQuotaList command and getQuotaListing api for hdfs quota > -- > > Key: HDFS-16095 > URL: https://issues.apache.org/jira/browse/HDFS-16095 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Currently hdfs does not support obtaining all quota information. The > administrator may need to check which quotas have been added to a certain > directory, or the quotas of the entire cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16096) Delete useless method DirectoryWithQuotaFeature#setQuota
Xiangyi Zhu created HDFS-16096: -- Summary: Delete useless method DirectoryWithQuotaFeature#setQuota Key: HDFS-16096 URL: https://issues.apache.org/jira/browse/HDFS-16096 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Reporter: Xiangyi Zhu Fix For: 3.4.0 Delete useless method DirectoryWithQuotaFeature#setQuota. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16095) Add lsQuotaList command and getQuotaListing api for hdfs quota
Xiangyi Zhu created HDFS-16095: -- Summary: Add lsQuotaList command and getQuotaListing api for hdfs quota Key: HDFS-16095 URL: https://issues.apache.org/jira/browse/HDFS-16095 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs Affects Versions: 3.4.0 Reporter: Xiangyi Zhu Currently hdfs does not support obtaining all quota information. The administrator may need to check which quotas have been added to a certain directory, or the quotas of the entire cluster. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16043) HDFS : Delete performance optimization
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17356302#comment-17356302 ] Xiangyi Zhu commented on HDFS-16043: [~hexiaoqiao] Thanks for your comment. correct. The modification here does not affect the SBN cleanup block, which works normally in the HA scenario. Those check errors I will deal with it and add unit tests. The second optimization is shared with the issue of HDFS-16000. I want to open an Issue to optimize the calculation time-consuming problem of QuotaCount. > HDFS : Delete performance optimization > -- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Attachments: 20210527-after.svg, 20210527-before.svg > > Time Spent: 20m > Remaining Estimate: 0h > > The deletion of the large directory caused NN to hold the lock for too long, > which caused our NameNode to be killed by ZKFC. > Through the flame graph, it is found that its main time-consuming > calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting > inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. > h3. solution: > 1. RemoveBlocks is processed asynchronously. A thread is started in the > BlockManager to process the deleted blocks and control the lock time. > 2. QuotaCount calculation optimization, this is similar to the optimization > of this Issue HDFS-16000. > h3. Comparison before and after optimization: > Delete 1000w Inode and 1000w block test. > *before:* > remove inode elapsed time: 7691 ms > remove block elapsed time :11107 ms > *after:* > remove inode elapsed time: 4149 ms > remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16043) HDFS : Delete performance optimization
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17354754#comment-17354754 ] Xiangyi Zhu commented on HDFS-16043: [~hexiaoqiao],[~vjasani],[~vjasani] Looking forward to your comments. > HDFS : Delete performance optimization > -- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Attachments: 20210527-after.svg, 20210527-before.svg > > Time Spent: 20m > Remaining Estimate: 0h > > The deletion of the large directory caused NN to hold the lock for too long, > which caused our NameNode to be killed by ZKFC. > Through the flame graph, it is found that its main time-consuming > calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting > inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. > h3. solution: > 1. RemoveBlocks is processed asynchronously. A thread is started in the > BlockManager to process the deleted blocks and control the lock time. > 2. QuotaCount calculation optimization, this is similar to the optimization > of this Issue HDFS-16000. > h3. Comparison before and after optimization: > Delete 1000w Inode and 1000w block test. > *before:* > remove inode elapsed time: 7691 ms > remove block elapsed time :11107 ms > *after:* > remove inode elapsed time: 4149 ms > remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16045) FileSystem.CACHE memory leak
[ https://issues.apache.org/jira/browse/HDFS-16045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17352937#comment-17352937 ] Xiangyi Zhu commented on HDFS-16045: [~hexiaoqiao] Thank you very much for your comment. Presto uses the "UserGroupInformation#createProxyUser" method to create UGI, which needs to pass the real super user information (here should be related to Kerberos), and the "FileSystem#get" Api uses "UserGroupInformation#createRemoteUser" to create UGI. The latter does not require real user information, and the created UGI only contains information related to the user name, which means that the latter uses the same nature of the UGI created by the same user. I think they can share the same Filesystem instance.The consideration may not be comprehensive, welcome to discuss. {code:java} public static UserGroupInformation createRemoteUser(String user, AuthMethod authMethod) { if (user == null || user.isEmpty()) { throw new IllegalArgumentException("Null user"); } Subject subject = new Subject(); subject.getPrincipals().add(new User(user)); UserGroupInformation result = new UserGroupInformation(subject); result.setAuthenticationMethod(authMethod); return result; }{code} > FileSystem.CACHE memory leak > > > Key: HDFS-16045 > URL: https://issues.apache.org/jira/browse/HDFS-16045 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Priority: Major > > {code:java} > FileSystem get(final URI uri, final Configuration conf, > final String user){code} > When the client turns on the cache and uses the above API to specify the user > to create a Filesystem instance, the cache will be invalid. > The specified user creates a new UGI every time he creates a Filesystem > instance, and cache compares it according to UGI. > {code:java} > public int hashCode() { > return (scheme + authority).hashCode() + ugi.hashCode() + (int)unique; > }{code} > Whether you can use username to replace UGI to make a comparison, and whether > there are other risks. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16045) FileSystem.CACHE memory leak
Xiangyi Zhu created HDFS-16045: -- Summary: FileSystem.CACHE memory leak Key: HDFS-16045 URL: https://issues.apache.org/jira/browse/HDFS-16045 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.4.0 Reporter: Xiangyi Zhu {code:java} FileSystem get(final URI uri, final Configuration conf, final String user){code} When the client turns on the cache and uses the above API to specify the user to create a Filesystem instance, the cache will be invalid. The specified user creates a new UGI every time he creates a Filesystem instance, and cache compares it according to UGI. {code:java} public int hashCode() { return (scheme + authority).hashCode() + ugi.hashCode() + (int)unique; }{code} Whether you can use username to replace UGI to make a comparison, and whether there are other risks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16043) HDFS : Delete performance optimization
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16043: --- Description: The deletion of the large directory caused NN to hold the lock for too long, which caused our NameNode to be killed by ZKFC. Through the flame graph, it is found that its main time-consuming calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. h3. solution: 1. RemoveBlocks is processed asynchronously. A thread is started in the BlockManager to process the deleted blocks and control the lock time. 2. QuotaCount calculation optimization, this is similar to the optimization of this Issue HDFS-16000. h3. Comparison before and after optimization: Delete 1000w Inode and 1000w block test. *before:* remove inode elapsed time: 7691 ms remove block elapsed time :11107 ms *after:* remove inode elapsed time: 4149 ms remove block elapsed time :0 ms was: The deletion of the large directory caused NN to hold the lock for too long, which caused our NameNode to be killed by ZKFC. Through the flame graph, it is found that its main time-consuming calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. h3. solution: 1. RemoveBlocks is processed asynchronously. A thread is started in the BlockManager to process the deleted blocks and control the lock time. 2. QuotaCount calculation optimization, this is similar to the optimization of this Issue HDFS-16000. h3. Comparison before and after optimization: Delete 1000w Inode and 1000w block test. *before:* Before optimization: remove inode elapsed time: 7691 ms remove block elapsed time :11107 ms *after:* remove inode elapsed time: 4149 ms remove block elapsed time :0 ms > HDFS : Delete performance optimization > -- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Attachments: 20210527-after.svg, 20210527-before.svg > > > The deletion of the large directory caused NN to hold the lock for too long, > which caused our NameNode to be killed by ZKFC. > Through the flame graph, it is found that its main time-consuming > calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting > inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. > h3. solution: > 1. RemoveBlocks is processed asynchronously. A thread is started in the > BlockManager to process the deleted blocks and control the lock time. > 2. QuotaCount calculation optimization, this is similar to the optimization > of this Issue HDFS-16000. > h3. Comparison before and after optimization: > Delete 1000w Inode and 1000w block test. > *before:* > remove inode elapsed time: 7691 ms > remove block elapsed time :11107 ms > *after:* > remove inode elapsed time: 4149 ms > remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16043) HDFS : Delete performance optimization
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16043: --- Attachment: 20210527-before.svg 20210527-after.svg > HDFS : Delete performance optimization > -- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Priority: Major > Attachments: 20210527-after.svg, 20210527-before.svg > > > The deletion of the large directory caused NN to hold the lock for too long, > which caused our NameNode to be killed by ZKFC. > Through the flame graph, it is found that its main time-consuming > calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting > inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. > h3. solution: > 1. RemoveBlocks is processed asynchronously. A thread is started in the > BlockManager to process the deleted blocks and control the lock time. > 2. QuotaCount calculation optimization, this is similar to the optimization > of this Issue HDFS-16000. > h3. Comparison before and after optimization: > Delete 1000w Inode and 1000w block test. > *before:* > Before optimization: remove inode elapsed time: 7691 ms > remove block elapsed time :11107 ms > *after:* > remove inode elapsed time: 4149 ms > remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16043) HDFS : Delete performance optimization
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu reassigned HDFS-16043: -- Assignee: Xiangyi Zhu > HDFS : Delete performance optimization > -- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > Attachments: 20210527-after.svg, 20210527-before.svg > > > The deletion of the large directory caused NN to hold the lock for too long, > which caused our NameNode to be killed by ZKFC. > Through the flame graph, it is found that its main time-consuming > calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting > inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. > h3. solution: > 1. RemoveBlocks is processed asynchronously. A thread is started in the > BlockManager to process the deleted blocks and control the lock time. > 2. QuotaCount calculation optimization, this is similar to the optimization > of this Issue HDFS-16000. > h3. Comparison before and after optimization: > Delete 1000w Inode and 1000w block test. > *before:* > Before optimization: remove inode elapsed time: 7691 ms > remove block elapsed time :11107 ms > *after:* > remove inode elapsed time: 4149 ms > remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16043) HDFS : Delete performance optimization
Xiangyi Zhu created HDFS-16043: -- Summary: HDFS : Delete performance optimization Key: HDFS-16043 URL: https://issues.apache.org/jira/browse/HDFS-16043 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs, namanode Affects Versions: 3.4.0 Reporter: Xiangyi Zhu The deletion of the large directory caused NN to hold the lock for too long, which caused our NameNode to be killed by ZKFC. Through the flame graph, it is found that its main time-consuming calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. h3. solution: 1. RemoveBlocks is processed asynchronously. A thread is started in the BlockManager to process the deleted blocks and control the lock time. 2. QuotaCount calculation optimization, this is similar to the optimization of this Issue [HDFS-16000|https://issues.apache.org/jira/browse/HDFS-16000]. h3. Comparison before and after optimization: Delete 1000w Inode and 1000w block test. *before:* Before optimization: remove inode elapsed time: 7691 ms remove block elapsed time :11107 ms *after:* remove inode elapsed time: 4149 ms remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16043) HDFS : Delete performance optimization
[ https://issues.apache.org/jira/browse/HDFS-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangyi Zhu updated HDFS-16043: --- Description: The deletion of the large directory caused NN to hold the lock for too long, which caused our NameNode to be killed by ZKFC. Through the flame graph, it is found that its main time-consuming calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. h3. solution: 1. RemoveBlocks is processed asynchronously. A thread is started in the BlockManager to process the deleted blocks and control the lock time. 2. QuotaCount calculation optimization, this is similar to the optimization of this Issue HDFS-16000. h3. Comparison before and after optimization: Delete 1000w Inode and 1000w block test. *before:* Before optimization: remove inode elapsed time: 7691 ms remove block elapsed time :11107 ms *after:* remove inode elapsed time: 4149 ms remove block elapsed time :0 ms was: The deletion of the large directory caused NN to hold the lock for too long, which caused our NameNode to be killed by ZKFC. Through the flame graph, it is found that its main time-consuming calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. h3. solution: 1. RemoveBlocks is processed asynchronously. A thread is started in the BlockManager to process the deleted blocks and control the lock time. 2. QuotaCount calculation optimization, this is similar to the optimization of this Issue [HDFS-16000|https://issues.apache.org/jira/browse/HDFS-16000]. h3. Comparison before and after optimization: Delete 1000w Inode and 1000w block test. *before:* Before optimization: remove inode elapsed time: 7691 ms remove block elapsed time :11107 ms *after:* remove inode elapsed time: 4149 ms remove block elapsed time :0 ms > HDFS : Delete performance optimization > -- > > Key: HDFS-16043 > URL: https://issues.apache.org/jira/browse/HDFS-16043 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs, namanode >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Priority: Major > > The deletion of the large directory caused NN to hold the lock for too long, > which caused our NameNode to be killed by ZKFC. > Through the flame graph, it is found that its main time-consuming > calculation is QuotaCount when removingBlocks(toRemovedBlocks) and deleting > inodes, and removeBlocks(toRemovedBlocks) takes a higher proportion of time. > h3. solution: > 1. RemoveBlocks is processed asynchronously. A thread is started in the > BlockManager to process the deleted blocks and control the lock time. > 2. QuotaCount calculation optimization, this is similar to the optimization > of this Issue HDFS-16000. > h3. Comparison before and after optimization: > Delete 1000w Inode and 1000w block test. > *before:* > Before optimization: remove inode elapsed time: 7691 ms > remove block elapsed time :11107 ms > *after:* > remove inode elapsed time: 4149 ms > remove block elapsed time :0 ms -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16032) DFSClient#delete supports Trash
[ https://issues.apache.org/jira/browse/HDFS-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351669#comment-17351669 ] Xiangyi Zhu commented on HDFS-16032: [~ayushtkn],[~sodonnell] Thanks a lot for your comments, I use your suggestions to improve it. > DFSClient#delete supports Trash > > > Key: HDFS-16032 > URL: https://issues.apache.org/jira/browse/HDFS-16032 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hadoop-client, hdfs >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Currently, HDFS can only move deleted data to Trash through Shell commands. > In actual scenarios, most of the data is deleted through DFSClient Api. I > think it should support Trash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16032) DFSClient#delete supports Trash
[ https://issues.apache.org/jira/browse/HDFS-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17351454#comment-17351454 ] Xiangyi Zhu commented on HDFS-16032: [~hexiaoqiao],[~ayushtkn],[Stephen O'Donnell|https://issues.apache.org/jira/secure/ViewProfile.jspa?name=sodonnell] Looking forward your comments. > DFSClient#delete supports Trash > > > Key: HDFS-16032 > URL: https://issues.apache.org/jira/browse/HDFS-16032 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hadoop-client, hdfs >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > Currently, HDFS can only move deleted data to Trash through Shell commands. > In actual scenarios, most of the data is deleted through DFSClient Api. I > think it should support Trash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately
[ https://issues.apache.org/jira/browse/HDFS-16039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17350835#comment-17350835 ] Xiangyi Zhu commented on HDFS-16039: [~elgoiri] Looking forward to your comments. > RBF: Some indicators of RBFMetrics count inaccurately > -- > > Key: HDFS-16039 > URL: https://issues.apache.org/jira/browse/HDFS-16039 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Affects Versions: 3.4.0 >Reporter: Xiangyi Zhu >Assignee: Xiangyi Zhu >Priority: Major > > RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity > The current statistical algorithm is to accumulate all Nn indicators, which > will lead to inaccurate counting. I think that the same ClusterID only needs > to take one Max and then do the accumulation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16039) RBF: Some indicators of RBFMetrics count inaccurately
Xiangyi Zhu created HDFS-16039: -- Summary: RBF: Some indicators of RBFMetrics count inaccurately Key: HDFS-16039 URL: https://issues.apache.org/jira/browse/HDFS-16039 Project: Hadoop HDFS Issue Type: Bug Components: rbf Affects Versions: 3.4.0 Reporter: Xiangyi Zhu Assignee: Xiangyi Zhu RBFMetrics#getNumLiveNodes, getNumNamenodes, getTotalCapacity The current statistical algorithm is to accumulate all Nn indicators, which will lead to inaccurate counting. I think that the same ClusterID only needs to take one Max and then do the accumulation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org