[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643142#comment-17643142 ] ASF GitHub Bot commented on HDFS-16855: --- dingshun3016 commented on PR #5170: URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1336857490 > > > Now that, this case only happen when invoke addBlockPool() and CachingGetSpaceUsed#used < 0, I have an idea, is it possible to add a switch, not add lock when ReplicaCachingGetSpaceUsed#init() at first time , and add it at other times > > > > > > This makes sense to me, get replicas usage message no need strong consistency.@Hexiaoqiao any suggestion? > > Thanks for the detailed discussions. +1. it seems good to me. BTW, I try to dig PR to fix this bug but no found. It just at out internal branch which not refresh space used at init stage. And refresh-used is one complete async thread (at CachingGetSpaceUsed) , thus it could not dead lock when DataNode instance restart. Thanks. Thanks replay. looks like this PR [HDFS-14986](https://issues.apache.org/jira/browse/HDFS-14986), forbid refresh() when ReplicaCachingGetSpaceUsed#init() at first time. I prefer to not add lock when ReplicaCachingGetSpaceUsed#init() at first time > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643012#comment-17643012 ] ASF GitHub Bot commented on HDFS-16855: --- Hexiaoqiao commented on PR #5170: URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1336418882 > > Now that, this case only happen when invoke addBlockPool() and CachingGetSpaceUsed#used < 0, I have an idea, is it possible to add a switch, not add lock when ReplicaCachingGetSpaceUsed#init() at first time , and add it at other times > > This makes sense to me, get replicas usage message no need strong consistency.@Hexiaoqiao any suggestion? Thanks for the detailed discussions. +1. it seems good to me. BTW, I try to dig PR to fix this bug but no found. It just at out internal branch which not refresh space used at init stage. And refresh-used is one complete async thread (at CachingGetSpaceUsed) , thus it could not dead lock when DataNode instance restart. Thanks. > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642266#comment-17642266 ] ASF GitHub Bot commented on HDFS-16855: --- MingXiangLi commented on PR #5170: URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1334745101 `Now that, this case only happen when invoke addBlockPool() and CachingGetSpaceUsed#used < 0, I have an idea, is it possible to add a switch, not add lock when ReplicaCachingGetSpaceUsed#init() at first time , and add it at other times ` This makes sense to me, get replicas usage message no need strong consistency.@Hexiaoqiao any suggestion? > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642005#comment-17642005 ] ASF GitHub Bot commented on HDFS-16855: --- hadoop-yetus commented on PR #5170: URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1333921434 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 11s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 42m 16s | | trunk passed | | +1 :green_heart: | compile | 1m 36s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 30s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 17s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 39s | | trunk passed | | +1 :green_heart: | javadoc | 1m 18s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 41s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 51s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 27s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 23s | | the patch passed | | +1 :green_heart: | compile | 1m 32s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 32s | | the patch passed | | +1 :green_heart: | compile | 1m 21s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 21s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 59s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 25s | | the patch passed | | +1 :green_heart: | javadoc | 0m 57s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 31s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 30s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 7s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 415m 49s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 5s | | The patch does not generate ASF License warnings. | | | | 535m 40s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNode | | | hadoop.hdfs.TestLeaseRecovery2 | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5170 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c8aeace311b7 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 54a0786d1072f1440a297bb197887a83b941ba65 | | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/2/testReport/ | |
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641893#comment-17641893 ] ASF GitHub Bot commented on HDFS-16855: --- dingshun3016 commented on PR #5170: URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1333710803 according to the situation discussed so far, it seems that there are several ways to solve this problem - remove the BLOCK_POOl level write lock in #addBlockPool > but worry about having replica consistency problem - forbid refresh() when ReplicaCachingGetSpaceUsed#init() at first time > it will cause the value of dfsUsage to be 0 until the next time refresh() - use du or df command instead at first time > du is very expensive and slow > df is inaccurate when the disk sharing by other servers reference [HDFS-14313](https://issues.apache.org/jira/browse/HDFS-14313) Now that, this case only happen when invoke addBlockPool() and CachingGetSpaceUsed#used < 0, I have an idea, is it possible to add a switch, not add lock when ReplicaCachingGetSpaceUsed#init() at first time , and add it at other times do you think it's possible?@MingXiangLi > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641740#comment-17641740 ] ASF GitHub Bot commented on HDFS-16855: --- MingXiangLi commented on PR #5170: URL: https://github.com/apache/hadoop/pull/5170#issuecomment-172902 The BLOCK_POOl level lock is to protect replica consistency for FsDataSetImpl when read or write operating happend at same time. > forbid refresh() when ReplicaCachingGetSpaceUsed #init() at first time,it will cause the value of dfsUsage to be 0 until the next time refresh(). For example we can use df command instead at first time or other way. On my side It's less risky to change ReplicaCachingGetSpaceUsed logic than remove the write lock. Or we can further discussion to make sure no case will lead to consistency problem if we remove write lock. > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641693#comment-17641693 ] ASF GitHub Bot commented on HDFS-16855: --- dingshun3016 commented on PR #5170: URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1333269882 > @dingshun3016 This seems to only happen when invoke addBlockPool() and CachingGetSpaceUsed#used < 0, so why not handle it for example like forbid refresh() when ReplicaCachingGetSpaceUsed#init() at first time ? @MingXiangLi thanks reply forbid refresh() when ReplicaCachingGetSpaceUsed #init() at first time,it will cause the value of dfsUsage to be 0 until the next time refresh(). if remove the BLOCK_POOl level write lock in the org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool(String bpid, Configuration conf) method, what will be the impact ? do you have any other suggestions? > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641690#comment-17641690 ] ASF GitHub Bot commented on HDFS-16855: --- dingshun3016 opened a new pull request, #5170: URL: https://github.com/apache/hadoop/pull/5170 When patching the datanode's fine-grained lock, we found that the datanode couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool get writeLock org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica need readLock because it is not the same thread, so the write lock cannot be downgraded to a read lock > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641688#comment-17641688 ] ASF GitHub Bot commented on HDFS-16855: --- dingshun3016 commented on PR #5170: URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1333268646 > @dingshun3016 This seems to only happen when invoke addBlockPool() and CachingGetSpaceUsed#used < 0, so why not handle it for example like forbid refresh() when ReplicaCachingGetSpaceUsed#init() at first time ? > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641689#comment-17641689 ] ASF GitHub Bot commented on HDFS-16855: --- dingshun3016 closed pull request #5170: HDFS-16855. Remove the redundant write lock in addBlockPool. URL: https://github.com/apache/hadoop/pull/5170 > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641675#comment-17641675 ] ASF GitHub Bot commented on HDFS-16855: --- MingXiangLi commented on PR #5170: URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1333200919 @dingshun3016 This seems to only happen when invoke addBlockPool() and CachingGetSpaceUsed#used < 0, so why not handle it for example like forbid refresh() when CachingGetSpaceUsed#init() at first time ? > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641299#comment-17641299 ] dingshun commented on HDFS-16855: - [~hexiaoqiao] I've looked at the logic of the latest trunk branch, but can't seem to find anything. If you later find a PR that fixes it, please post it. thanks In addition, I would like to ask, if I remove the BLOCK_POOl level write lock in the org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool(String bpid, Configuration conf) method, what will be the impact ? > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641171#comment-17641171 ] Xiaoqiao He commented on HDFS-16855: [~dingshun] Thanks for the detailed explanation. IIRC, at the beginning of DataNode's fine-grained lock, it indeed could cause dead lock here, hence it had fixed at the following PRs. Sorry I don't find the related PR now. Would you mind to check the logic branch trunk? welcome more feedback if any issue you meet. Thanks again. cc [~Aiphag0] Any more suggestions? > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641116#comment-17641116 ] dingshun commented on HDFS-16855: - [~hexiaoqiao] Thanks for your replay. We found that when starting the datanode, the #addBlockPool(String bpid, Configuration conf) method of FsDatasetImpl will be called, and a BLOCK_POOl level write lock will be added. In the #addBlockPool(final String bpid, final Configuration conf) method of FsVolumeList, multiple threads will be started to initialize BlockPoolSlice, and the value of dfsUsage needs to be obtained when BlockPoolSlice is initialized. Because our fs.getspaceused.classname is configured with ReplicaCachingGetSpaceUsed, so we need to call #deepCopyReplica(String bpid) of FsDatasetImpl, and call #replicas(String bpid, Consumer> consumer) of ReplicaMap in #deepCopyReplica, but #replicas add a read lock at the BLOCK_POOl level. Since they are not the same thread and they are using the read-write lock of the same ReentrantReadWriteLock instance, so the write lock cannot be downgraded to a read lock > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641058#comment-17641058 ] Xiaoqiao He commented on HDFS-16855: [~dingshun]Thanks for your catches. Sorry I didn't get the relationship between #addBlockPool and #replicas here. Would you mind to add some information about the dead lock? Thanks. > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640277#comment-17640277 ] ASF GitHub Bot commented on HDFS-16855: --- hadoop-yetus commented on PR #5170: URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1329816617 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 2s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 2s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 44m 10s | | trunk passed | | +1 :green_heart: | compile | 1m 40s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | compile | 1m 34s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 28s | | trunk passed | | +1 :green_heart: | mvnsite | 2m 2s | | trunk passed | | +1 :green_heart: | javadoc | 1m 34s | | trunk passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 48s | | trunk passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 16s | | trunk passed | | +1 :green_heart: | shadedclient | 28m 43s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 32s | | the patch passed | | +1 :green_heart: | compile | 1m 34s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javac | 1m 34s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 0s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 24s | | the patch passed | | +1 :green_heart: | javadoc | 0m 58s | | the patch passed with JDK Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 34s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 18s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 388m 9s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 54s | | The patch does not generate ASF License warnings. | | | | 513m 3s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5170 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 0f9749a8f192 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 54a0786d1072f1440a297bb197887a83b941ba65 | | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/1/testReport/ | | Max. process+thread count | 2123 (vs. ulimit of 5500)
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640039#comment-17640039 ] ASF GitHub Bot commented on HDFS-16855: --- dingshun3016 opened a new pull request, #5170: URL: https://github.com/apache/hadoop/pull/5170 When patching the datanode's fine-grained lock, we found that the datanode couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool get writeLock org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica need readLock because it is not the same thread, so the write lock cannot be downgraded to a read lock > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: dingshun >Priority: Major > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org