[
https://issues.apache.org/jira/browse/HDFS-16214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18039497#comment-18039497
]
ASF GitHub Bot commented on HDFS-16214:
---------------------------------------
github-actions[bot] closed pull request #3885: HDFS-16214. Asynchronously
collect blocks and update quota when deleting
URL: https://github.com/apache/hadoop/pull/3885
> Asynchronously collect blocks and update quota when deleting
> ------------------------------------------------------------
>
> Key: HDFS-16214
> URL: https://issues.apache.org/jira/browse/HDFS-16214
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Affects Versions: 3.4.0
> Reporter: Xiangyi Zhu
> Assignee: Xiangyi Zhu
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1.5h
> Remaining Estimate: 0h
>
> The time-consuming deletion is mainly reflected in three logics , collecting
> blocks, deleting Inode from InodeMap, and deleting blocks. The current
> deletion is divided into two major steps. Step 1 acquires the lock, collects
> the block and inode, deletes the inode, and releases the lock. Step 2 Acquire
> the lock and delete the block to release the lock.
> Phase 2 is currently deleting blocks in batches, which can control the lock
> holding time. Here we can also delete blocks asynchronously.
> Now step 1 still has the problem of holding the lock for a long time.
> For stage 1, we can make the collection block not hold the lock. The process
> is as follows, step 1 obtains the lock, parent.removeChild, writes to
> editLog, releases the lock. Step 2 no lock, collects the block. Step 3
> acquire lock, update quota, release lease, release lock. Step 4 acquire lock,
> delete Inode from InodeMap, release lock. Step 5 acquire lock, delete block
> to release lock.
> There may be some problems following the above process:
> 1. When the /a/b/c file is writing, then delete the /a/b directory. If the
> deletion is performed to the collecting block stage, the client writes
> complete or addBlock to the /a/b/c file at this time. This step is not locked
> and delete /a/b and editLog has been written successfully. In this case, the
> order of editLog is delete /a/c and complete /a/b/c. In this case, the
> standby node playback editLog /a/b/c file has been deleted, and then go to
> complete /a/b/c file will be abnormal.
> *The process is as follows:*
> *write editLog order: delete /a/b/c -> delete /a/b -> complete /a/b/c*
> *replay editLog order:* *delete /a/b/c ->* *delete /a/b ->* *complete /a/b/c
> {color:#ff0000}(not found){color}*
> 2. If a delete operation is executed to the stage of collecting block, then
> the administrator executes saveNameSpace, and then restarts Namenode. This
> situation may cause the Inode that has been deleted from the parent childList
> to remain in the InodeMap.
> To solve the above problem, in step 1, add the inode being deleted to the
> Set. When there is a file WriteFileOp (logAllocateBlockId/logCloseFile
> EditLog), check whether there is this file and one of its parent Inodes in
> the Set, and throw it if there is. An exception FileNotFoundException
> occurred.
> In addition, the execution of saveNamespace needs to wait for all iNodes in
> Set to be removed before execution.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]