[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-05-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=775613&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775613
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 28/May/22 03:56
Start Date: 28/May/22 03:56
Worklog Time Spent: 10m 
  Work Description: ZanderXu opened a new pull request, #4367:
URL: https://github.com/apache/hadoop/pull/4367

   Detail info please refer to 
[HDFS-16600](https://issues.apache.org/jira/browse/HDFS-16600). 
   The UT 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
failed, because happened deadlock, which is introduced by 
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534).
   
   ```
   // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
need a read lock
   try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
   b.getBlockPoolId()))
   // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
3526 need a write lock
   try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
bpid))
   ```
   
   




Issue Time Tracking
---

Worklog Id: (was: 775613)
Remaining Estimate: 0h
Time Spent: 10m

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-05-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=775615&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775615
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 28/May/22 04:00
Start Date: 28/May/22 04:00
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1140164413

   @Hexiaoqiao @ayushtkn This bug was introduced by 
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534). Please help me 
review it, thanks




Issue Time Tracking
---

Worklog Id: (was: 775615)
Time Spent: 20m  (was: 10m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-05-27 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=775628&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775628
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 28/May/22 04:51
Start Date: 28/May/22 04:51
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1140172992

   Got it. I will gain a deep understanding of the scope of this BP lock and 
try to solve this case with ReadLock.




Issue Time Tracking
---

Worklog Id: (was: 775628)
Time Spent: 0.5h  (was: 20m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-05-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=775656&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-775656
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 28/May/22 09:36
Start Date: 28/May/22 09:36
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1140222126

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 47s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  36m 51s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 11s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 12s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m  8s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m  8s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 52s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 15s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  21m 41s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 236m 34s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 50s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 339m  3s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4367 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 8e8fbc25f7bf 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 
10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / d477636990330ffb7029eb22ec41f99d6dc67fdf |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/1/testReport/ |
   | Max. process+thread count | 2717 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-05-30 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=776103&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776103
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 31/May/22 06:30
Start Date: 31/May/22 06:30
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1141717958

   I learned from the implementation of **moveBlockAcrossStorage** and used BP 
ReadLock to fix this bug. 
   @ayushtkn please help me review this patch again, thanks.




Issue Time Tracking
---

Worklog Id: (was: 776103)
Time Spent: 50m  (was: 40m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=776281&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776281
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 31/May/22 12:15
Start Date: 31/May/22 12:15
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1142055562

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 44s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  37m 48s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 18s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 35s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  6s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 38s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 17s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 50s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 17s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 241m 46s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 51s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 346m 57s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4367 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 0c52856a8f1c 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 
10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 0100f4917d24e4db256e5032f75a8f64bb76391a |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-05-31 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=776340&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776340
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 31/May/22 14:05
Start Date: 31/May/22 14:05
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1142182281

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 55s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 41s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 17s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 18s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 54s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 12s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 29s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 38s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  25m 28s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 336m 31s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 452m 24s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.tools.TestDFSAdmin |
   |   | hadoop.hdfs.server.sps.TestExternalStoragePolicySatisfier |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4367 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 6d2952785abf 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f08e25d23aa96705511da6358769b81a4a711080 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubun

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778436&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778436
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 09:27
Start Date: 05/Jun/22 09:27
Worklog Time Spent: 10m 
  Work Description: MingXiangLi commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146772397

   @ZanderXu  LGTM.The core logic in HDFS-16534 is change block pool write lock 
to read lock and add volume lock for each replica under this block pool.And we 
didn't change this method in HDFS-16534 because it's not a heavy call. So this 
commit is makes sense to me.




Issue Time Tracking
---

Worklog Id: (was: 778436)
Time Spent: 1h 20m  (was: 1h 10m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778440&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778440
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 09:51
Start Date: 05/Jun/22 09:51
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146775213

   Thanks @MingXiangLi for your review. 
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534) means a lot to 
me and I learned a lot from it.




Issue Time Tracking
---

Worklog Id: (was: 778440)
Time Spent: 1.5h  (was: 1h 20m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778447&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778447
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 12:47
Start Date: 05/Jun/22 12:47
Worklog Time Spent: 10m 
  Work Description: slfan1989 commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146798609

   @ZanderXu  I found that some tests of Junit Test of HDFS (DN) sometimes 
succeed and sometimes fail, but there is no way to judge whether it is related 
to DeadLock. How do you judge the occurrence of DeadLock?
   
   @MingXiangLi HDFS-16534 is a very big change, which will greatly help the 
performance improvement of DN, but ZanderXu has already proposed 2 Jiras for 
this change. Can you help to re-examine this HDFS-16534, if it is separate each 
time The commit fixes pr, worried that it will bring more problems.




Issue Time Tracking
---

Worklog Id: (was: 778447)
Time Spent: 1h 40m  (was: 1.5h)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778448&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778448
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 05/Jun/22 12:58
Start Date: 05/Jun/22 12:58
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146800123

   @slfan1989 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
sometimes succeed? I have run it several times locally and all fail.
   
   > How do you judge the occurrence of DeadLock?
   Deadlock is trigged when evictLazyPersistBlocks is required in createRbw. 
Because createRbw hold BLOCK_POOL read lock, but evictLazyPersistBlocks try to 
hold BLOCK_POOL write lock.




Issue Time Tracking
---

Worklog Id: (was: 778448)
Time Spent: 1h 50m  (was: 1h 40m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-05 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=778479&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-778479
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 06/Jun/22 00:36
Start Date: 06/Jun/22 00:36
Worklog Time Spent: 10m 
  Work Description: slfan1989 commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1146920391

   Thank you for your contribution, but I still have some concerns about 
HDFS-16534. I feel that for a new feature, multiple prs should not be used to 
fix the problem separately, which makes the code very difficult to read. I 
recommend creating it under HDFS-15382 A subtask to fix HDFS-16598 and 
HDFS-16600 together, @ZanderXu @MingXiangLi.




Issue Time Tracking
---

Worklog Id: (was: 778479)
Time Spent: 2h  (was: 1h 50m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=779077&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779077
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 07/Jun/22 13:06
Start Date: 07/Jun/22 13:06
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1148648304

   @ZanderXu Thanks for the great catch here.
   It is indeed missed method which need to improve. cc @MingXiangLi @ZanderXu 
would you mind to check if other methods also leave this issues?
   
   > Thank you for your contribution, but I still have some concerns about 
[HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534). I feel that for 
a new feature, multiple prs should not be used to fix the problem separately, 
which makes the code very difficult to read. I recommend creating it under 
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) A subtask to fix 
[HDFS-16598](https://issues.apache.org/jira/browse/HDFS-16598) and 
[HDFS-16600](https://issues.apache.org/jira/browse/HDFS-16600) together.
   @slfan1989 Thansk for your suggestions. IMO, this is not the blocker issue. 
Any tickets will be collected to subtask of HDFS-16534 before checkin for 
committers. -1 to combine HDFS-16598 and HDFS-16600 together. IIUC, it is 
recommended to add/fix one issue for one ticket. Welcome to any more 
discussions.
   @ZanderXu @slfan1989 Thanks again.




Issue Time Tracking
---

Worklog Id: (was: 779077)
Time Spent: 2h 10m  (was: 2h)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=779080&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779080
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 07/Jun/22 13:14
Start Date: 07/Jun/22 13:14
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1148657754

   Thanks @Hexiaoqiao for your review, and I will check other methods.




Issue Time Tracking
---

Worklog Id: (was: 779080)
Time Spent: 2h 20m  (was: 2h 10m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=779130&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779130
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 07/Jun/22 14:31
Start Date: 07/Jun/22 14:31
Worklog Time Spent: 10m 
  Work Description: slfan1989 commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1148756359

   @ZanderXu @Hexiaoqiao Thank you very much everyone, I learned a lot from the 
discussion, I didn't pay attention to this pr, because the description 
information is too short, especially for me who just started reading hdfs code. 
I will summarize the calling process in the comment area of ​​HDFS-16600 Leave 
a message (ASAP), and I hope @ZanderXu @Hexiaoqiao you can help me check it too.




Issue Time Tracking
---

Worklog Id: (was: 779130)
Time Spent: 2.5h  (was: 2h 20m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-07 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=779134&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779134
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 07/Jun/22 14:39
Start Date: 07/Jun/22 14:39
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1148766297

   > Another suggestion, can you write the junit test? 
   
   you can see the UT 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction.




Issue Time Tracking
---

Worklog Id: (was: 779134)
Time Spent: 2h 40m  (was: 2.5h)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-08 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=779725&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779725
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 09/Jun/22 00:42
Start Date: 09/Jun/22 00:42
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1150551125

   Thanks @ayushtkn for your comment.
   > If possible would be good if a test can be added.
   
   I found this issues because the UT 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
failed, so I thinks it's a nice UT to verify this issue.




Issue Time Tracking
---

Worklog Id: (was: 779725)
Time Spent: 2h 50m  (was: 2h 40m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=779971&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779971
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 09/Jun/22 14:07
Start Date: 09/Jun/22 14:07
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1151166917

   Thanks all for the further discussion. About the unit test, I did not 
retrieve this one, 
`org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction`.
 Anything I missed? @ZanderXu Would mind to check if this test really located 
at hadoop-hdfs module now? Please correct me if I am wrong. Thanks again.




Issue Time Tracking
---

Worklog Id: (was: 779971)
Time Spent: 3h  (was: 2h 50m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-09 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=779985&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-779985
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 09/Jun/22 14:35
Start Date: 09/Jun/22 14:35
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1151202018

   Oh, I'm sorry, the failed UT is 
`org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement.testSynchronousEviction`.




Issue Time Tracking
---

Worklog Id: (was: 779985)
Time Spent: 3h 10m  (was: 3h)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=780274&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780274
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 10/Jun/22 10:14
Start Date: 10/Jun/22 10:14
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152205735

   > Oh, I'm sorry, the failed UT is 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement.testSynchronousEviction.
   
   Thanks @ZanderXu for your information. I think it can cover this case, let's 
wait what Ayush think about it.
   Just find the latest build not clean. Try to trigger jenkins again, let's 
wait what it will say.




Issue Time Tracking
---

Worklog Id: (was: 780274)
Time Spent: 3h 20m  (was: 3h 10m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=780345&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780345
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 10/Jun/22 15:40
Start Date: 10/Jun/22 15:40
Worklog Time Spent: 10m 
  Work Description: ayushtkn commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152493868

   Makes sense to me. Thanx everyone 




Issue Time Tracking
---

Worklog Id: (was: 780345)
Time Spent: 3.5h  (was: 3h 20m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-10 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=780461&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780461
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 11/Jun/22 01:23
Start Date: 11/Jun/22 01:23
Worklog Time Spent: 10m 
  Work Description: slfan1989 commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152828770

   @Hexiaoqiao @ZanderXu @tomscut 
   
   I still have some doubts about this.
   
   1. I still hope ZanderXu Can provide deadlock exception stack error 
information, I will continue to try to reproduce this problem in this part.
   
   2. I read the code of testSynchronousEviction carefully, this code uses the 
special storage strategy LAZY_PERSIST, This strategy will asynchronously flush 
memory blocks to disk. LazyWriter takes care of this work.
   Part of the code is as follows
   ```
   private boolean saveNextReplica() {
 RamDiskReplica block = null;
 FsVolumeReference targetReference;
 FsVolumeImpl targetVolume;
 ReplicaInfo replicaInfo;
 boolean succeeded = false;
   
 try {
   block = ramDiskReplicaTracker.dequeueNextReplicaToPersist();
   if (block != null) {
 try (AutoCloseableLock lock = 
lockManager.writeLock(LockLevel.BLOCK_POOl,
 block.getBlockPoolId())) {
   replicaInfo = volumeMap.get(block.getBlockPoolId(), 
block.getBlockId());
 .
   ```
   If ZanderXu's judgment is correct, will this code also deadlock?
   
   3.I always have a question, why we first add blockpool readlock, and then 
add volume write lock, how is the order of this lock derived?
   
   4.I checked lockManager.writeLock(LockLevel.BLOCK_POOl, 
block.getBlockPoolId()), and I found that when adding volume, the writeLock of 
BLOCK_POOl is also used, so will it also deadlock?
   
   > in conclusion
   
   I don't think this is a deadlock. Is it because createRow got the read lock, 
which caused evictBlocks to get the write lock for a long time, and then 
exceeded the waiting time of the junit test, which eventually led to an error.
   
   I think to solve this problem completely, we also need to look at the 
processing logic of LazyWriter. It should not be enough to just modify 
evictBlocks.
   




Issue Time Tracking
---

Worklog Id: (was: 780461)
Time Spent: 3h 40m  (was: 3.5h)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=780495&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780495
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 11/Jun/22 10:45
Start Date: 11/Jun/22 10:45
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152897335

   Thanks @slfan1989 for your comment.
   I'm sorry and I feel that you don't get the root cause of  the failure of 
`org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.TestLazyPersistReplicaPlacement.testSynchronousEviction`.
   
   Please refer to the stack, and 
   ```
   "DataXceiver for client DFSClient_NONMAPREDUCE_-1350116008_11 at 
/127.0.0.1:51273 [Receiving block BP-1502139676-192.168
   .3.4-1654943490123:blk_1073741826_1002]" #146 daemon prio=5 os_prio=31 
tid=0x7fb5cee2d800 nid=0x11507 waiting on con
   dition [0x7c8ed000]
  java.lang.Thread.State: WAITING (parking)
   at sun.misc.Unsafe.park(Native Method)
   - parking to wait for  <0x0007a14b6330> (a 
java.util.concurrent.locks.ReentrantReadWriteLock$FairSync)
   at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
   at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:8
   36)
   at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
   at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
   at 
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock(ReentrantReadWriteLock.java:943)
   at 
org.apache.hadoop.hdfs.server.common.AutoCloseDataSetLock.lock(AutoCloseDataSetLock.java:62)
   at 
org.apache.hadoop.hdfs.server.datanode.DataSetLockManager.getWriteLock(DataSetLockManager.java:214)
   at 
org.apache.hadoop.hdfs.server.datanode.DataSetLockManager.writeLock(DataSetLockManager.java:170)
   at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl$LazyWriter.evictBlocks(FsDatasetImpl.java:3526)
   at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.evictLazyPersistBlocks(FsDatasetImpl.java:3656)
   at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.reserveLockedMemory(FsDatasetImpl.java:3675)
   at 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createRbw(FsDatasetImpl.java:1606)
   at 
org.apache.hadoop.hdfs.server.datanode.BlockReceiver.(BlockReceiver.java:219)
   at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.getBlockReceiver(DataXceiver.java:1319)
   at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:767)
   at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:176)
   at 
org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:110)
   at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:293)
   at java.lang.Thread.run(Thread.java:748)
   ```
   
   > Is it because createRbw got the read lock, which caused evictBlocks to get 
the write lock for a long time
   evictBlocks is impossible to acquire the write lock, since 
[createRBW_logic](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1588)
 holds the read lock of this block pool. And 
[createRBW_logic](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1588)
 is waiting for evictBlocks to finish. so it's deadlock.
   
   > so will it also deadlock(When createRbw And addVolume are done at the same 
time)?
   I'm interested in this deadlock, can you provide a reproduction process?  
thanks~ 
   
   




Issue Time Tracking
---

Worklog Id: (was: 780495)
Time Spent: 3h 50m  (was: 3h 40m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.had

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=780507&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780507
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 11/Jun/22 12:48
Start Date: 11/Jun/22 12:48
Worklog Time Spent: 10m 
  Work Description: slfan1989 commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152922067

   > Please refer to the stack:
   
   Thank you very much, I have understood that this is indeed a deadlock, 
because the same thread needs to use both a read lock and a write lock.
   
   > evictBlocks could not successfully acquire the write lock, since 
[createRBW_logic](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1588)
 holds the read lock of this block pool. And 
[createRBW_logic](https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java#L1588)
 is waiting for evictBlocks to finish. so it's deadlock.
   
   very good explanation.
   
   > I'm interested in this deadlock, can you provide a reproduction process? 
thanks~
   
   Thanks for your patience in explaining, this is my guess, now it looks like 
this won't happen(deadlock) because createRbw And addVolume won't be executed 
in the same thread, and createRbw And LazyWriter won't deadlock because they're 
not executed in one thread.
   
   LGTM +1
   
   > The last question is
   why we first add blockpool readlock, and then add volume write lock, how is 
the order of this lock derived?




Issue Time Tracking
---

Worklog Id: (was: 780507)
Time Spent: 4h  (was: 3h 50m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=780510&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780510
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 11/Jun/22 13:02
Start Date: 11/Jun/22 13:02
Worklog Time Spent: 10m 
  Work Description: MingXiangLi commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152924424

   @slfan1989 You can refer this 
https://issues.apache.org/jira/browse/HDFS-15382 jira which may answer your 
question。




Issue Time Tracking
---

Worklog Id: (was: 780510)
Time Spent: 4h 10m  (was: 4h)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=780511&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-780511
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 11/Jun/22 14:12
Start Date: 11/Jun/22 14:12
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1152934755

   Yeah, about last question, @slfan1989 you can prefer to 
[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382). If you have any 
questions, feel free to concat me and discuss it.




Issue Time Tracking
---

Worklog Id: (was: 780511)
Time Spent: 4h 20m  (was: 4h 10m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=781652&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781652
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 15/Jun/22 13:31
Start Date: 15/Jun/22 13:31
Worklog Time Spent: 10m 
  Work Description: Hexiaoqiao commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-115642

   Retrigger jenkins and wait to another build result. 
   Thanks everyone's helpful discussion. I would like to checkin for a while if 
no more other comments and build clean.




Issue Time Tracking
---

Worklog Id: (was: 781652)
Time Spent: 4.5h  (was: 4h 20m)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {code:java}
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.createRbw line 1588 
> need a read lock
> try (AutoCloseableLock lock = lockManager.readLock(LockLevel.BLOCK_POOl,
> b.getBlockPoolId()))
> // org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.evictBlocks line 
> 3526 need a write lock
> try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=781994&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781994
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 10:11
Start Date: 16/Jun/22 10:11
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1157483601

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  39m 57s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  40m 26s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 32s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed  |
   | +1 :green_heart: |  spotbugs  |   3m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m  2s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 258m  9s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 52s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 400m 10s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4367 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c020b276eba7 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 
10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f08e25d23aa96705511da6358769b81a4a711080 |
   | Default Java | Red Hat, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/8/testReport/ |
   | Max. process+thread count | 3757 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/8/console |
   | versions | git=2.9.5 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




Issue Time Tracking
---

Worklog Id: (was: 781994)
Time Spent: 4h 40m  (was: 4.5h)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> {cod