[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-12-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643142#comment-17643142
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

dingshun3016 commented on PR #5170:
URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1336857490

   > > > Now that, this case only happen when invoke addBlockPool() and 
CachingGetSpaceUsed#used < 0, I have an idea, is it possible to add a switch, 
not add lock when ReplicaCachingGetSpaceUsed#init() at first time , and add it 
at other times
   > > 
   > > 
   > > This makes sense to me, get replicas usage message no need strong 
consistency.@Hexiaoqiao any suggestion?
   > 
   > Thanks for the detailed discussions. +1. it seems good to me. BTW, I try 
to dig PR to fix this bug but no found. It just at out internal branch which 
not refresh space used at init stage. And refresh-used is one complete async 
thread (at CachingGetSpaceUsed) , thus it could not dead lock when DataNode 
instance restart. Thanks.
   
   Thanks replay. looks like this PR 
[HDFS-14986](https://issues.apache.org/jira/browse/HDFS-14986), forbid 
refresh() when ReplicaCachingGetSpaceUsed#init() at first time. 
   I prefer to not add lock when ReplicaCachingGetSpaceUsed#init() at first 
time 




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-12-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17643012#comment-17643012
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

Hexiaoqiao commented on PR #5170:
URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1336418882

   > > Now that, this case only happen when invoke addBlockPool() and 
CachingGetSpaceUsed#used < 0, I have an idea, is it possible to add a switch, 
not add lock when ReplicaCachingGetSpaceUsed#init() at first time , and add it 
at other times
   > 
   > This makes sense to me, get replicas usage message no need strong 
consistency.@Hexiaoqiao any suggestion?
   
   Thanks for the detailed discussions. +1. it seems good to me. 
   BTW, I try to dig PR to fix this bug but no found. It just at out internal 
branch which not refresh space used at init stage. And refresh-used is one 
complete async thread (at CachingGetSpaceUsed) , thus it could not dead lock 
when DataNode instance restart. Thanks.




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-12-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642266#comment-17642266
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

MingXiangLi commented on PR #5170:
URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1334745101

   `Now that, this case only happen when invoke addBlockPool() and 
CachingGetSpaceUsed#used < 0, I have an idea, is it possible to add a switch, 
not add lock when ReplicaCachingGetSpaceUsed#init() at first time , and add it 
at other times
   `
   This makes sense to me, get replicas usage message no need strong 
consistency.@Hexiaoqiao any suggestion?
   




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-12-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17642005#comment-17642005
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

hadoop-yetus commented on PR #5170:
URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1333921434

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 11s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 16s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 36s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 30s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 18s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 51s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 27s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 30s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m  7s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 415m 49s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  5s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 535m 40s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.qjournal.server.TestJournalNode |
   |   | hadoop.hdfs.TestLeaseRecovery2 |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5170 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c8aeace311b7 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 54a0786d1072f1440a297bb197887a83b941ba65 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/2/testReport/ |
   |

[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-12-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641893#comment-17641893
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

dingshun3016 commented on PR #5170:
URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1333710803

   according to the situation discussed so far, it seems that there are several 
ways to solve this problem
   -  remove the BLOCK_POOl level write lock in #addBlockPool
> but worry about having replica consistency problem
   - forbid refresh() when ReplicaCachingGetSpaceUsed#init() at first time
> it will cause the value of dfsUsage to be 0 until the next time 
refresh()
   - use du or df command instead at first time
> du is very expensive and slow
> df is inaccurate when the disk sharing by other servers
reference 
[HDFS-14313](https://issues.apache.org/jira/browse/HDFS-14313)
   
   Now that, this case only happen when invoke addBlockPool() and 
CachingGetSpaceUsed#used < 0,  I have an idea,  is it possible to add a switch, 
not add lock when ReplicaCachingGetSpaceUsed#init() at first time , and add it 
at other times
   
   do you think it's possible?@MingXiangLi 




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-12-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641740#comment-17641740
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

MingXiangLi commented on PR #5170:
URL: https://github.com/apache/hadoop/pull/5170#issuecomment-172902

   The BLOCK_POOl level lock is to protect replica consistency for 
FsDataSetImpl when read or write operating happend at same time.
   
   > forbid refresh() when ReplicaCachingGetSpaceUsed #init() at first time,it 
will cause the value of dfsUsage to be 0 until the next time refresh().
   For example we can use df command instead at first time or other way.
   
   On my side It's less risky to change ReplicaCachingGetSpaceUsed logic than 
remove the write lock.
   Or we can further discussion to make sure no case will lead to consistency 
problem if we remove write lock.
   




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641693#comment-17641693
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

dingshun3016 commented on PR #5170:
URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1333269882

   > @dingshun3016 This seems to only happen when invoke addBlockPool() and 
CachingGetSpaceUsed#used < 0, so why not handle it for example like forbid 
refresh() when ReplicaCachingGetSpaceUsed#init() at first time ?
   
   @MingXiangLi  thanks reply
   
   forbid refresh() when ReplicaCachingGetSpaceUsed #init() at first time,it 
will cause the value of dfsUsage to be 0 until the next time refresh(). 
   
if remove the BLOCK_POOl level write lock in the 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool(String
 bpid, Configuration conf) method, what will be the impact ?
   
   do you have any other suggestions?




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641690#comment-17641690
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

dingshun3016 opened a new pull request, #5170:
URL: https://github.com/apache/hadoop/pull/5170

   When patching the datanode's fine-grained lock, we found that the datanode 
couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
   
   
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
get writeLock
   
   
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
  need readLock
   
   because it is not the same thread, so the write lock cannot be downgraded to 
a read lock




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641688#comment-17641688
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

dingshun3016 commented on PR #5170:
URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1333268646

   > @dingshun3016 This seems to only happen when invoke addBlockPool() and 
CachingGetSpaceUsed#used < 0, so why not handle it for example like forbid 
refresh() when ReplicaCachingGetSpaceUsed#init() at first time ?
   
   




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641689#comment-17641689
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

dingshun3016 closed pull request #5170: HDFS-16855. Remove the redundant write 
lock in addBlockPool. 
URL: https://github.com/apache/hadoop/pull/5170




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-30 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641675#comment-17641675
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

MingXiangLi commented on PR #5170:
URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1333200919

   @dingshun3016 This seems to only happen when invoke addBlockPool() and 
CachingGetSpaceUsed#used < 0, so why not handle it for example like forbid 
refresh() when CachingGetSpaceUsed#init() at first time ? 




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-30 Thread dingshun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641299#comment-17641299
 ] 

dingshun commented on HDFS-16855:
-

[~hexiaoqiao] 

I've looked at the logic of the latest trunk branch, but can't seem to find 
anything. If you later find a PR that fixes it, please post it. thanks

In addition, I would like to ask, if I remove the BLOCK_POOl level write lock 
in the 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool(String
 bpid, Configuration conf) method, what will be the impact ?

> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-30 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641171#comment-17641171
 ] 

Xiaoqiao He commented on HDFS-16855:


[~dingshun] Thanks for the detailed explanation. IIRC, at the beginning of 
DataNode's fine-grained lock, it indeed could cause dead lock here, hence it 
had fixed at the following PRs. Sorry I don't find the related PR now. Would 
you mind to check the logic branch trunk? welcome more feedback if any issue 
you meet. Thanks again. cc [~Aiphag0] Any more suggestions?

> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-29 Thread dingshun (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641116#comment-17641116
 ] 

dingshun commented on HDFS-16855:
-

[~hexiaoqiao] Thanks for your replay. 

We found that when starting the datanode, the #addBlockPool(String bpid, 
Configuration conf) method of FsDatasetImpl will be called, and a BLOCK_POOl 
level write lock will be added.

In the #addBlockPool(final String bpid, final Configuration conf) method of 
FsVolumeList, multiple threads will be started to initialize BlockPoolSlice, 
and the value of dfsUsage needs to be obtained when BlockPoolSlice is 
initialized.

Because our fs.getspaceused.classname is configured with 
ReplicaCachingGetSpaceUsed, so we need to call #deepCopyReplica(String bpid) of 
FsDatasetImpl, and call #replicas(String bpid, Consumer> 
consumer) of ReplicaMap in #deepCopyReplica, but #replicas add a read lock at 
the BLOCK_POOl level.

 

Since they are not the same thread and they are using the read-write lock of 
the same ReentrantReadWriteLock instance, so the write lock cannot be 
downgraded to a read lock

> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-29 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17641058#comment-17641058
 ] 

Xiaoqiao He commented on HDFS-16855:


[~dingshun]Thanks for your catches. Sorry I didn't get the relationship between 
#addBlockPool and #replicas here. Would you mind to add some information about 
the dead lock? Thanks.

> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: dingshun
>Priority: Major
>  Labels: pull-request-available
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640277#comment-17640277
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

hadoop-yetus commented on PR #5170:
URL: https://github.com/apache/hadoop/pull/5170#issuecomment-1329816617

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  2s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  2s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  44m 10s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 40s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   2m  2s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 48s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   4m 16s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  28m 43s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 34s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 18s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 388m  9s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 54s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 513m  3s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5170 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 0f9749a8f192 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 54a0786d1072f1440a297bb197887a83b941ba65 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5170/1/testReport/ |
   | Max. process+thread count | 2123 (vs. ulimit of 5500) 

[jira] [Commented] (HDFS-16855) Remove the redundant write lock in addBlockPool

2022-11-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17640039#comment-17640039
 ] 

ASF GitHub Bot commented on HDFS-16855:
---

dingshun3016 opened a new pull request, #5170:
URL: https://github.com/apache/hadoop/pull/5170

   When patching the datanode's fine-grained lock, we found that the datanode 
couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
   
   
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
get writeLock
   
   
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
  need readLock
   
   because it is not the same thread, so the write lock cannot be downgraded to 
a read lock




> Remove the redundant write lock in addBlockPool
> ---
>
> Key: HDFS-16855
> URL: https://issues.apache.org/jira/browse/HDFS-16855
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: dingshun
>Priority: Major
>
> When patching the datanode's fine-grained lock, we found that the datanode 
> couldn't start,maybe happened deadlock,when addBlockPool, so we can remove it.
> {code:java}
> // getspaceused classname
>   
>     fs.getspaceused.classname
>     
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
>    {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#addBlockPool
>  
> // get writeLock
> @Override
> public void addBlockPool(String bpid, Configuration conf)
> throws IOException {
>   LOG.info("Adding block pool " + bpid);
>   AddBlockPoolException volumeExceptions = new AddBlockPoolException();
>   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> try {
>   volumes.addBlockPool(bpid, conf);
> } catch (AddBlockPoolException e) {
>   volumeExceptions.mergeException(e);
> }
> volumeMap.initBlockPool(bpid);
> Set vols = storageMap.keySet();
> for (String v : vols) {
>   lockManager.addLock(LockLevel.VOLUME, bpid, v);
> }
>   }
>  
> } {code}
> {code:java}
> // 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl#deepCopyReplica
> // need readLock
> void replicas(String bpid, Consumer> consumer) {
>   LightWeightResizableGSet m = null;
>   try (AutoCloseDataSetLock l = lockManager.readLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> m = map.get(bpid);
> if (m !=null) {
>   m.getIterator(consumer);
> }
>   }
> } {code}
>  
> because it is not the same thread, so the write lock cannot be downgraded to 
> a read lock
> {code:java}
> void addBlockPool(final String bpid, final Configuration conf) throws 
> IOException {
>   long totalStartTime = Time.monotonicNow();
>   final Map unhealthyDataDirs =
>   new ConcurrentHashMap();
>   List blockPoolAddingThreads = new ArrayList();
>   for (final FsVolumeImpl v : volumes) {
> Thread t = new Thread() {
>   public void run() {
> try (FsVolumeReference ref = v.obtainReference()) {
>   FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
>   " on volume " + v + "...");
>   long startTime = Time.monotonicNow();
>   v.addBlockPool(bpid, conf);
>   long timeTaken = Time.monotonicNow() - startTime;
>   FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
>   " on " + v + ": " + timeTaken + "ms");
> } catch (IOException ioe) {
>   FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
>   ". Will throw later.", ioe);
>   unhealthyDataDirs.put(v, ioe);
> }
>   }
> };
> blockPoolAddingThreads.add(t);
> t.start();
>   }
>   for (Thread t : blockPoolAddingThreads) {
> try {
>   t.join();
> } catch (InterruptedException ie) {
>   throw new IOException(ie);
> }
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org