[jira] [Resolved] (HDFS-16862) EC striped blocks support min blocks when doing in maintenance

2022-12-06 Thread dingshun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dingshun resolved HDFS-16862.
-
Resolution: Won't Do

> EC striped blocks support min blocks when doing  in maintenance
> ---
>
> Key: HDFS-16862
> URL: https://issues.apache.org/jira/browse/HDFS-16862
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: dingshun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16862) EC striped blocks support min blocks when doing in maintenance

2022-12-06 Thread dingshun (Jira)
dingshun created HDFS-16862:
---

 Summary: EC striped blocks support min blocks when doing  in 
maintenance
 Key: HDFS-16862
 URL: https://issues.apache.org/jira/browse/HDFS-16862
 Project: Hadoop HDFS
  Issue Type: New Feature
Reporter: dingshun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16858) Dynamically adjust max slow disks to exclude

2022-12-01 Thread dingshun (Jira)
dingshun created HDFS-16858:
---

 Summary: Dynamically adjust max slow disks to exclude
 Key: HDFS-16858
 URL: https://issues.apache.org/jira/browse/HDFS-16858
 Project: Hadoop HDFS
  Issue Type: New Feature
  Components: datanode
Reporter: dingshun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-30 Thread dingshun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641299#comment-17641299
 ] 

dingshun commented on HDFS-16855:
-

[~hexiaoqiao] 

I've looked at the logic of the latest trunk branch, but can't seem to find 
anything. If you later find a PR that fixes it, please post it. thanks

In addition, I would like to ask, if I remove the BLOCK_POOl level write lock 
in the 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool(String
 bpid, Configuration conf) method, what will be the impact ?

> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-29 Thread dingshun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641116#comment-17641116
 ] 

dingshun edited comment on HDFS-16855 at 11/30/22 7:39 AM:
---

[~hexiaoqiao] Thanks for your replay. 

We found that when starting the datanode, the 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool(String
 bpid, Configuration conf) method  will be called, and a BLOCK_POOl level write 
lock will be added.

In the 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsVolumeList#addBlockPool(final
 String bpid, final Configuration conf), multiple threads will be started to 
initialize BlockPoolSlice, and the value of dfsUsage needs to be obtained when 
BlockPoolSlice is initialized.

Because our fs.getspaceused.classname is configured with 
ReplicaCachingGetSpaceUsed, so we need to call 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica(String
 bpid), and call 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaMap#replicas(String
 bpid, Consumer> consumer), but #replicas add a read lock 
at the BLOCK_POOl level.

 

Since they are not the same thread and they are using the read-write lock of 
the same ReentrantReadWriteLock instance, so the write lock cannot be 
downgraded to a read lock


was (Author: dingshun):
[~hexiaoqiao] Thanks for your replay. 

We found that when starting the datanode, the #addBlockPool(String bpid, 
Configuration conf) method of FsDatasetImpl will be called, and a BLOCK_POOl 
level write lock will be added.

In the #addBlockPool(final String bpid, final Configuration conf) method of 
FsVolumeList, multiple threads will be started to initialize BlockPoolSlice, 
and the value of dfsUsage needs to be obtained when BlockPoolSlice is 
initialized.

Because our fs.getspaceused.classname is configured with 
ReplicaCachingGetSpaceUsed, so we need to call #deepCopyReplica(String bpid) of 
FsDatasetImpl, and call #replicas(String bpid, Consumer> 
consumer) of ReplicaMap in #deepCopyReplica, but #replicas add a read lock at 
the BLOCK_POOl level.

 

Since they are not the same thread and they are using the read-write lock of 
the same ReentrantReadWriteLock instance, so the write lock cannot be 
downgraded to a read lock

> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotoni

[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-29 Thread dingshun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641116#comment-17641116
 ] 

dingshun commented on HDFS-16855:
-

[~hexiaoqiao] Thanks for your replay. 

We found that when starting the datanode, the #addBlockPool(String bpid, 
Configuration conf) method of FsDatasetImpl will be called, and a BLOCK_POOl 
level write lock will be added.

In the #addBlockPool(final String bpid, final Configuration conf) method of 
FsVolumeList, multiple threads will be started to initialize BlockPoolSlice, 
and the value of dfsUsage needs to be obtained when BlockPoolSlice is 
initialized.

Because our fs.getspaceused.classname is configured with 
ReplicaCachingGetSpaceUsed, so we need to call #deepCopyReplica(String bpid) of 
FsDatasetImpl, and call #replicas(String bpid, Consumer> 
consumer) of ReplicaMap in #deepCopyReplica, but #replicas add a read lock at 
the BLOCK_POOl level.

 

Since they are not the same thread and they are using the read-write lock of 
the same ReentrantReadWriteLock instance, so the write lock cannot be 
downgraded to a read lock

> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-29 Thread dingshun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dingshun updated HDFS-16855:

Component/s: datanode

> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16855) Remove the redundant lock in addBlockPool

2022-11-28 Thread dingshun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dingshun updated HDFS-16855:

Summary: Remove the redundant lock in addBlockPool  (was: addBlockPool will 
cause deadlock when datanode starts)

> Remove the redundant lock in addBlockPool
> -
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: dingshun
>Priority: Major
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-28 Thread dingshun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dingshun updated HDFS-16855:

Summary: Remove the redundant write lock in addBlockPool  (was: Remove the 
redundant lock in addBlockPool)

> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: dingshun
>Priority: Major
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16855) addBlockPool will cause deadlock when datanode starts

2022-11-28 Thread dingshun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dingshun updated HDFS-16855:

Description: 
When patching the datanode's fine-grained lock, we found that the datanode 
couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
{code:java}
// getspaceused classname
  
    fs.getspaceused.classname
    
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
   {code}
{code:java}
// 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
 
// get writeLock
@Override
public void addBlockPool(String bpid, Configuration conf)
throws IOException {
  LOG.info("Adding block pool " + bpid);
  AddBlockPoolException volumeExceptions = new AddBlockPoolException();
  try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
bpid)) {
try {
  volumes.addBlockPool(bpid, conf);
} catch (AddBlockPoolException e) {
  volumeExceptions.mergeException(e);
}
volumeMap.initBlockPool(bpid);
Set vols = storageMap.keySet();
for (String v : vols) {
  lockManager.addLock(LockLevel.VOLUME, bpid, v);
}
  }
 
} {code}
{code:java}
// 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
// need readLock
void replicas(String bpid, Consumer> consumer) {
  LightWeightResizableGSet m = null;
  try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
bpid)) {
m = map.get(bpid);
if (m !=null) {
  m.getIterator(consumer);
}
  }
} {code}
 

because it is not the same thread, so the write lock cannot be downgraded to a 
read lock
{code:java}

void addBlockPool(final String bpid, final Configuration conf) throws 
IOException {
  long totalStartTime = Time.monotonicNow();
  final Map unhealthyDataDirs =
  new ConcurrentHashMap();
  List blockPoolAddingThreads = new ArrayList();
  for (final FsVolumeImpl v : volumes) {
Thread t = new Thread() {
  public void run() {
try (FsVolumeReference ref = v.obtainReference()) {
  FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
  " on volume " + v + "...");
  long startTime = Time.monotonicNow();
  v.addBlockPool(bpid, conf);
  long timeTaken = Time.monotonicNow() - startTime;
  FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
  " on " + v + ": " + timeTaken + "ms");
} catch (IOException ioe) {
  FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
  ". Will throw later.", ioe);
  unhealthyDataDirs.put(v, ioe);
}
  }
};
blockPoolAddingThreads.add(t);
t.start();
  }
  for (Thread t : blockPoolAddingThreads) {
try {
  t.join();
} catch (InterruptedException ie) {
  throw new IOException(ie);
}
  }
} {code}
 

> addBlockPool will cause deadlock when datanode starts
> -
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: dingshun
>Priority: Major
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws

[jira] [Created] (HDFS-16855) addBlockPool will cause deadlock when datanode starts

2022-11-28 Thread dingshun (Jira)
dingshun created HDFS-16855:
---

 Summary: addBlockPool will cause deadlock when datanode starts
 Key: HDFS-16855
 URL: https://issues.apache.org/jira/browse/HDFS-16855
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: dingshun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16809) EC striped block is not sufficient when doing in maintenance

2022-10-20 Thread dingshun (Jira)
dingshun created HDFS-16809:
---

 Summary: EC striped block is not sufficient when doing in 
maintenance
 Key: HDFS-16809
 URL: https://issues.apache.org/jira/browse/HDFS-16809
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ec, hdfs
Reporter: dingshun


When doing maintenance, ec striped block is not sufficient, which will lead to 
miss block



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org