[jira] [Resolved] (HDFS-16862) EC striped blocks support min blocks when doing in maintenance
[ https://issues.apache.org/jira/browse/HDFS-16862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dingshun resolved HDFS-16862. - Resolution: Won't Do > EC striped blocks support min blocks when doing in maintenance > --- > > Key: HDFS-16862 > URL: https://issues.apache.org/jira/browse/HDFS-16862 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: dingshun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16862) EC striped blocks support min blocks when doing in maintenance
dingshun created HDFS-16862: --- Summary: EC striped blocks support min blocks when doing in maintenance Key: HDFS-16862 URL: https://issues.apache.org/jira/browse/HDFS-16862 Project: Hadoop HDFS Issue Type: New Feature Reporter: dingshun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16858) Dynamically adjust max slow disks to exclude
dingshun created HDFS-16858: --- Summary: Dynamically adjust max slow disks to exclude Key: HDFS-16858 URL: https://issues.apache.org/jira/browse/HDFS-16858 Project: Hadoop HDFS Issue Type: New Feature Components: datanode Reporter: dingshun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641299#comment-17641299 ] dingshun commented on HDFS-16855: - [~hexiaoqiao] I've looked at the logic of the latest trunk branch, but can't seem to find anything. If you later find a PR that fixes it, please post it. thanks In addition, I would like to ask, if I remove the BLOCK_POOl level write lock in the org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool(String bpid, Configuration conf) method, what will be the impact ? > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641116#comment-17641116 ] dingshun edited comment on HDFS-16855 at 11/30/22 7:39 AM: --- [~hexiaoqiao] Thanks for your replay. We found that when starting the datanode, the org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool(String bpid, Configuration conf) method will be called, and a BLOCK_POOl level write lock will be added. In the org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList#addBlockPool(final String bpid, final Configuration conf), multiple threads will be started to initialize BlockPoolSlice, and the value of dfsUsage needs to be obtained when BlockPoolSlice is initialized. Because our fs.getspaceused.classname is configured with ReplicaCachingGetSpaceUsed, so we need to call org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica(String bpid), and call org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap#replicas(String bpid, Consumer> consumer), but #replicas add a read lock at the BLOCK_POOl level. Since they are not the same thread and they are using the read-write lock of the same ReentrantReadWriteLock instance, so the write lock cannot be downgraded to a read lock was (Author: dingshun): [~hexiaoqiao] Thanks for your replay. We found that when starting the datanode, the #addBlockPool(String bpid, Configuration conf) method of FsDatasetImpl will be called, and a BLOCK_POOl level write lock will be added. In the #addBlockPool(final String bpid, final Configuration conf) method of FsVolumeList, multiple threads will be started to initialize BlockPoolSlice, and the value of dfsUsage needs to be obtained when BlockPoolSlice is initialized. Because our fs.getspaceused.classname is configured with ReplicaCachingGetSpaceUsed, so we need to call #deepCopyReplica(String bpid) of FsDatasetImpl, and call #replicas(String bpid, Consumer> consumer) of ReplicaMap in #deepCopyReplica, but #replicas add a read lock at the BLOCK_POOl level. Since they are not the same thread and they are using the read-write lock of the same ReentrantReadWriteLock instance, so the write lock cannot be downgraded to a read lock > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotoni
[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641116#comment-17641116 ] dingshun commented on HDFS-16855: - [~hexiaoqiao] Thanks for your replay. We found that when starting the datanode, the #addBlockPool(String bpid, Configuration conf) method of FsDatasetImpl will be called, and a BLOCK_POOl level write lock will be added. In the #addBlockPool(final String bpid, final Configuration conf) method of FsVolumeList, multiple threads will be started to initialize BlockPoolSlice, and the value of dfsUsage needs to be obtained when BlockPoolSlice is initialized. Because our fs.getspaceused.classname is configured with ReplicaCachingGetSpaceUsed, so we need to call #deepCopyReplica(String bpid) of FsDatasetImpl, and call #replicas(String bpid, Consumer> consumer) of ReplicaMap in #deepCopyReplica, but #replicas add a read lock at the BLOCK_POOl level. Since they are not the same thread and they are using the read-write lock of the same ReentrantReadWriteLock instance, so the write lock cannot be downgraded to a read lock > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dingshun updated HDFS-16855: Component/s: datanode > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode >Reporter: dingshun >Priority: Major > Labels: pull-request-available > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16855) Remove the redundant lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dingshun updated HDFS-16855: Summary: Remove the redundant lock in addBlockPool (was: addBlockPool will cause deadlock when datanode starts) > Remove the redundant lock in addBlockPool > - > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: dingshun >Priority: Major > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16855) Remove the redundant write lock in addBlockPool
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dingshun updated HDFS-16855: Summary: Remove the redundant write lock in addBlockPool (was: Remove the redundant lock in addBlockPool) > Remove the redundant write lock in addBlockPool > --- > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: dingshun >Priority: Major > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws > IOException { > long totalStartTime = Time.monotonicNow(); > final Map unhealthyDataDirs = > new ConcurrentHashMap(); > List blockPoolAddingThreads = new ArrayList(); > for (final FsVolumeImpl v : volumes) { > Thread t = new Thread() { > public void run() { > try (FsVolumeReference ref = v.obtainReference()) { > FsDatasetImpl.LOG.info("Scanning block pool " + bpid + > " on volume " + v + "..."); > long startTime = Time.monotonicNow(); > v.addBlockPool(bpid, conf); > long timeTaken = Time.monotonicNow() - startTime; > FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + > " on " + v + ": " + timeTaken + "ms"); > } catch (IOException ioe) { > FsDatasetImpl.LOG.info("Caught exception while scanning " + v + > ". Will throw later.", ioe); > unhealthyDataDirs.put(v, ioe); > } > } > }; > blockPoolAddingThreads.add(t); > t.start(); > } > for (Thread t : blockPoolAddingThreads) { > try { > t.join(); > } catch (InterruptedException ie) { > throw new IOException(ie); > } > } > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16855) addBlockPool will cause deadlock when datanode starts
[ https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dingshun updated HDFS-16855: Description: When patching the datanode's fine-grained lock, we found that the datanode couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. {code:java} // getspaceused classname fs.getspaceused.classname org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed {code} {code:java} // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool // get writeLock @Override public void addBlockPool(String bpid, Configuration conf) throws IOException { LOG.info("Adding block pool " + bpid); AddBlockPoolException volumeExceptions = new AddBlockPoolException(); try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, bpid)) { try { volumes.addBlockPool(bpid, conf); } catch (AddBlockPoolException e) { volumeExceptions.mergeException(e); } volumeMap.initBlockPool(bpid); Set vols = storageMap.keySet(); for (String v : vols) { lockManager.addLock(LockLevel.VOLUME, bpid, v); } } } {code} {code:java} // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica // need readLock void replicas(String bpid, Consumer> consumer) { LightWeightResizableGSet m = null; try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, bpid)) { m = map.get(bpid); if (m !=null) { m.getIterator(consumer); } } } {code} because it is not the same thread, so the write lock cannot be downgraded to a read lock {code:java} void addBlockPool(final String bpid, final Configuration conf) throws IOException { long totalStartTime = Time.monotonicNow(); final Map unhealthyDataDirs = new ConcurrentHashMap(); List blockPoolAddingThreads = new ArrayList(); for (final FsVolumeImpl v : volumes) { Thread t = new Thread() { public void run() { try (FsVolumeReference ref = v.obtainReference()) { FsDatasetImpl.LOG.info("Scanning block pool " + bpid + " on volume " + v + "..."); long startTime = Time.monotonicNow(); v.addBlockPool(bpid, conf); long timeTaken = Time.monotonicNow() - startTime; FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + " on " + v + ": " + timeTaken + "ms"); } catch (IOException ioe) { FsDatasetImpl.LOG.info("Caught exception while scanning " + v + ". Will throw later.", ioe); unhealthyDataDirs.put(v, ioe); } } }; blockPoolAddingThreads.add(t); t.start(); } for (Thread t : blockPoolAddingThreads) { try { t.join(); } catch (InterruptedException ie) { throw new IOException(ie); } } } {code} > addBlockPool will cause deadlock when datanode starts > - > > Key: HDFS-16855 > URL: https://issues.apache.org/jira/browse/HDFS-16855 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: dingshun >Priority: Major > > When patching the datanode's fine-grained lock, we found that the datanode > couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it. > {code:java} > // getspaceused classname > > fs.getspaceused.classname > > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed > {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool > > // get writeLock > @Override > public void addBlockPool(String bpid, Configuration conf) > throws IOException { > LOG.info("Adding block pool " + bpid); > AddBlockPoolException volumeExceptions = new AddBlockPoolException(); > try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, > bpid)) { > try { > volumes.addBlockPool(bpid, conf); > } catch (AddBlockPoolException e) { > volumeExceptions.mergeException(e); > } > volumeMap.initBlockPool(bpid); > Set vols = storageMap.keySet(); > for (String v : vols) { > lockManager.addLock(LockLevel.VOLUME, bpid, v); > } > } > > } {code} > {code:java} > // > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica > // need readLock > void replicas(String bpid, Consumer> consumer) { > LightWeightResizableGSet m = null; > try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, > bpid)) { > m = map.get(bpid); > if (m !=null) { > m.getIterator(consumer); > } > } > } {code} > > because it is not the same thread, so the write lock cannot be downgraded to > a read lock > {code:java} > void addBlockPool(final String bpid, final Configuration conf) throws
[jira] [Created] (HDFS-16855) addBlockPool will cause deadlock when datanode starts
dingshun created HDFS-16855: --- Summary: addBlockPool will cause deadlock when datanode starts Key: HDFS-16855 URL: https://issues.apache.org/jira/browse/HDFS-16855 Project: Hadoop HDFS Issue Type: Bug Reporter: dingshun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16809) EC striped block is not sufficient when doing in maintenance
dingshun created HDFS-16809: --- Summary: EC striped block is not sufficient when doing in maintenance Key: HDFS-16809 URL: https://issues.apache.org/jira/browse/HDFS-16809 Project: Hadoop HDFS Issue Type: Bug Components: ec, hdfs Reporter: dingshun When doing maintenance, ec striped block is not sufficient, which will lead to miss block -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org