[ https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Xiangyi Zhu updated HDFS-16214: ------------------------------- Summary: Asynchronously collect blocks and update quota when deleting (was: Lock optimization for large deleteing, no locks on the collection block) > Asynchronously collect blocks and update quota when deleting > ------------------------------------------------------------ > > Key: HDFS-16214 > URL: https://issues.apache.org/jira/browse/HDFS-16214 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Affects Versions: 3.4.0 > Reporter: Xiangyi Zhu > Assignee: Xiangyi Zhu > Priority: Major > Labels: pull-request-available > Time Spent: 40m > Remaining Estimate: 0h > > The time-consuming deletion is mainly reflected in three logics , collecting > blocks, deleting Inode from InodeMap, and deleting blocks. The current > deletion is divided into two major steps. Step 1 acquires the lock, collects > the block and inode, deletes the inode, and releases the lock. Step 2 Acquire > the lock and delete the block to release the lock. > Phase 2 is currently deleting blocks in batches, which can control the lock > holding time. Here we can also delete blocks asynchronously. > Now step 1 still has the problem of holding the lock for a long time. > For stage 1, we can make the collection block not hold the lock. The process > is as follows, step 1 obtains the lock, parent.removeChild, writes to > editLog, releases the lock. Step 2 no lock, collects the block. Step 3 > acquire lock, update quota, release lease, release lock. Step 4 acquire lock, > delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block > to release lock. > There may be some problems following the above process: > 1. When the /a/b/c file is writing, then delete the /a/b directory. If the > deletion is performed to the collecting block stage, the client writes > complete or addBlock to the /a/b/c file at this time. This step is not locked > and delete /a/b and editLog has been written successfully. In this case, the > order of editLog is delete /a/c and complete /a/b/c. In this case, the > standby node playback editLog /a/b/c file has been deleted, and then go to > complete /a/b/c file will be abnormal. > *The process is as follows:* > *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c* > *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c > {color:#ff0000}(not found){color}* > 2. If a delete operation is executed to the stage of collecting block, then > the administrator executes saveNameSpace, and then restarts Namenode. This > situation may cause the Inode that has been deleted from the parent childList > to remain in the InodeMap. > To solve the above problem, in step 1, add the inode being deleted to the > Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile > EditLog), check whether there is this file and one of its parent Inodes in > the Set, and throw it if there is. An exception FileNotFoundException > occurred. > In addition, the execution of saveNamespace needs to wait for all iNodes in > Set to be removed before execution. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org