[ https://issues.apache.org/jira/browse/HDFS-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17627461#comment-17627461 ]
ASF GitHub Bot commented on HDFS-16804: --------------------------------------- MingXiangLi commented on code in PR #5033: URL: https://github.com/apache/hadoop/pull/5033#discussion_r1011148137 ########## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java: ########## @@ -3220,7 +3220,7 @@ public static void setBlockPoolId(String bpid) { } @Override - public void shutdownBlockPool(String bpid) { + public synchronized void shutdownBlockPool(String bpid) { Review Comment: should add synchronized to addBlockPool() too? > AddVolume contains a race condition with shutdown block pool > ------------------------------------------------------------ > > Key: HDFS-16804 > URL: https://issues.apache.org/jira/browse/HDFS-16804 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: ZanderXu > Assignee: ZanderXu > Priority: Major > Labels: pull-request-available > > Add Volume contains a race condition with shutdown block pool, causing the > ReplicaMap still contains some blocks belong to the removed block pool. > And the new volume still contains one unused BlockPoolSlice belongs to the > removed block pool, caused some problems, such as: incorrect dfsUsed, > incorrect numBlocks of the volume. > Let's review the logic of addVolume and shutdownBlockPool respectively. > > AddVolume Logic: > * Step1: Get all namespaceInfo from blockPoolManager > * Step2: Create one temporary FsVolumeImpl object > * Step3: Create some blockPoolSlice according to the namespaceInfo and add > them to the temporary FsVolumeImpl object > * Step4: Scan all blocks of the namespaceInfo from the volume and store them > by one temporary ReplicaMap > * Step5: Active the temporary FsVolumeImpl which created before (with > FsDatasetImpl synchronized lock) > ** Step5.1: Merge all blocks of the temporary ReplicaMap to the global > ReplicaMap > ** Step5.2: Add the FsVolumeImpl to the volumes > ShutdownBlockPool Logic:(with blockPool write lock) > * Step1: Cleanup the blockPool from the global ReplicaMap > * Step2: Shutdown the block pool from all the volumes > ** Step2.1: do some clean operations for the block pool, such as > saveReplica, saveDfsUsed, etc > ** Step2.2: remove the blockPool from bpSlices > The race condition can be reproduced by the following steps: > * AddVolume Step1: Get all namespaceInfo from blockPoolManager > * ShutdownBlockPool Step1: Cleanup the blockPool from the global ReplicaMap > * ShutdownBlockPool Step2: Shutdown the block pool from all the volumes > * AddVolume Step 2~5 > And result: > * The global replicaMap contains some blocks belong to the removed blockPool > * The bpSlices of the FsVolumeImpl contains one blockPoolSlice belong to the > removed blockPool > Expected result: > * The global replicaMap shouldn't contain any blocks belong to the removed > blockPool > * The bpSlices of any FsVolumeImpl shouldn't contain any blockPoolSlice > belong to the removed blockPool -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org