[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625967#comment-17625967
 ] 

ASF GitHub Bot commented on HDFS-15654:
---

hadoop-yetus commented on PR #5089:
URL: https://github.com/apache/hadoop/pull/5089#issuecomment-1295671859

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  10m  8s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ branch-3.3 Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 29s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  spotbugs  |   3m 39s |  |  branch-3.3 passed  |
   | +1 :green_heart: |  shadedclient  |  28m 27s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 45s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   3m 28s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  27m 51s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 218m 51s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 55s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 341m 42s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.balancer.TestBalancerWithHANameNodes |
   |   | hadoop.hdfs.TestDecommissionWithStripedBackoffMonitor |
   |   | hadoop.hdfs.TestRollingUpgrade |
   |   | hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5089 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 635fa4db048e 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | branch-3.3 / d5d56167ae6ede0f5c7dfe3c7af0e48e6f754a76 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~18.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/1/testReport/ |
   | Max. process+thread count | 2200 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5089/1/console |
   | versions | git=2.17.1 maven=3.6.0 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> TestBPOfferService#testMissBlocksWhenReregister fails intermittently
> 
>
> Key: HDFS-15654
> URL: https://issues.apache.org/jira/browse/HDFS-15654
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>   

[jira] [Commented] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfered to observer state

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625965#comment-17625965
 ] 

ASF GitHub Bot commented on HDFS-16547:
---

tomscut commented on PR #4201:
URL: https://github.com/apache/hadoop/pull/4201#issuecomment-1295658813

   The failed unit test is unrelated to the change. This is another issue.




> [SBN read] Namenode in safe mode should not be transfered to observer state
> ---
>
> Key: HDFS-16547
> URL: https://issues.apache.org/jira/browse/HDFS-16547
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Tao Li
>Assignee: Tao Li
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently, when a Namenode is in safemode(under starting or enter safemode 
> manually), we can transfer this Namenode to Observer by command. This 
> Observer node may receive many requests and then throw a SafemodeException, 
> this causes unnecessary failover on the client.
> So Namenode in safe mode should not be transfer to observer state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16828:

Description: 
Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
This also created inconsistency in dfshealth ui. In dfshealth UI:
 * Missing blocks count comes from blockManager.getMissinngBlockCount() which 
is 1
 * Corrupted file comes from fsck: NamenodeFsck.listCorruptFileBlocks() which 
doesn't count orphaned blocks so it is empty.

!image-2022-10-28-15-18-24-944.png!

  was:
Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
This also created inconsistency between dfshealth ui. In dfshealth UI:
 * Missing blocks count comes from blockManager.getMissinngBlockCount()
 * Corrupted file comes from fsck: NamenodeFsck.listCorruptFileBlocks() which 
doesn't count orphaned blocks.

!image-2022-10-28-15-18-24-944.png!


> Fsck doesn't count orphaned missing blocks.
> ---
>
> Key: HDFS-16828
> URL: https://issues.apache.org/jira/browse/HDFS-16828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Minor
> Attachments: image-2022-10-28-15-18-24-944.png
>
>
> Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() 
> to get missing blocks. This creates inconsistency with metasave.
> In the case where orphaned block is present, metasave and fsck show different 
> result:
> metasave shows:
>  
> {code:java}
> Metasave: Blocks currently missing: 1
> [orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 
> e: 0){code}
>  
> but fsck -list-corruptfileblocks  shows different count:
> {noformat}
> The filesystem under path '/' has 0 CORRUPT files
> {noformat}
> This also created inconsistency in dfshealth ui. In dfshealth UI:
>  * Missing blocks count comes from blockManager.getMissinngBlockCount() which 
> is 1
>  * Corrupted file comes from fsck: NamenodeFsck.listCorruptFileBlocks() which 
> doesn't count orphaned blocks so it is empty.
> !image-2022-10-28-15-18-24-944.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16828:

Description: 
Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
This also created inconsistency between dfshealth ui. In dfshealth UI:
 * Missing blocks count comes from blockManager.getMissinngBlockCount()
 * Corrupted file comes from fsck: NamenodeFsck.listCorruptFileBlocks() which 
doesn't count orphaned blocks.

!image-2022-10-28-15-18-24-944.png!

  was:
Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
 


> Fsck doesn't count orphaned missing blocks.
> ---
>
> Key: HDFS-16828
> URL: https://issues.apache.org/jira/browse/HDFS-16828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Minor
> Attachments: image-2022-10-28-15-18-24-944.png
>
>
> Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() 
> to get missing blocks. This creates inconsistency with metasave.
> In the case where orphaned block is present, metasave and fsck show different 
> result:
> metasave shows:
>  
> {code:java}
> Metasave: Blocks currently missing: 1
> [orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 
> e: 0){code}
>  
> but fsck -list-corruptfileblocks  shows different count:
> {noformat}
> The filesystem under path '/' has 0 CORRUPT files
> {noformat}
> This also created inconsistency between dfshealth ui. In dfshealth UI:
>  * Missing blocks count comes from blockManager.getMissinngBlockCount()
>  * Corrupted file comes from fsck: NamenodeFsck.listCorruptFileBlocks() which 
> doesn't count orphaned blocks.
> !image-2022-10-28-15-18-24-944.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16828:

Attachment: image-2022-10-28-15-18-24-944.png

> Fsck doesn't count orphaned missing blocks.
> ---
>
> Key: HDFS-16828
> URL: https://issues.apache.org/jira/browse/HDFS-16828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Minor
> Attachments: image-2022-10-28-15-18-24-944.png
>
>
> Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() 
> to get missing blocks. This creates inconsistency with metasave.
> In the case where orphaned block is present, metasave and fsck show different 
> result:
> metasave shows:
>  
> {code:java}
> Metasave: Blocks currently missing: 1
> [orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 
> e: 0){code}
>  
> but fsck -list-corruptfileblocks  shows different count:
> {noformat}
> The filesystem under path '/' has 0 CORRUPT files
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lei Yang updated HDFS-16828:

Affects Version/s: 2.10.0

> Fsck doesn't count orphaned missing blocks.
> ---
>
> Key: HDFS-16828
> URL: https://issues.apache.org/jira/browse/HDFS-16828
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.0
>Reporter: Lei Yang
>Priority: Minor
>
> Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() 
> to get missing blocks. This creates inconsistency with metasave.
> In the case where orphaned block is present, metasave and fsck show different 
> result:
> metasave shows:
>  
> {code:java}
> Metasave: Blocks currently missing: 1
> [orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 
> e: 0){code}
>  
> but fsck -list-corruptfileblocks  shows different count:
> {noformat}
> The filesystem under path '/' has 0 CORRUPT files
> {noformat}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16828) Fsck doesn't count orphaned missing blocks.

2022-10-28 Thread Lei Yang (Jira)
Lei Yang created HDFS-16828:
---

 Summary: Fsck doesn't count orphaned missing blocks.
 Key: HDFS-16828
 URL: https://issues.apache.org/jira/browse/HDFS-16828
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Lei Yang


Fsck use corrputedFIles.size instead of blockManager.getMissinngBlockCount() to 
get missing blocks. This creates inconsistency with metasave.

In the case where orphaned block is present, metasave and fsck show different 
result:

metasave shows:

 
{code:java}
Metasave: Blocks currently missing: 1
[orphaned]: blk_106452613228_105447711565 MISSING (replicas: l: 0 d: 0 c: 0 e: 
0){code}
 

but fsck -list-corruptfileblocks  shows different count:
{noformat}
The filesystem under path '/' has 0 CORRUPT files
{noformat}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfered to observer state

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625936#comment-17625936
 ] 

ASF GitHub Bot commented on HDFS-16547:
---

hadoop-yetus commented on PR #4201:
URL: https://github.com/apache/hadoop/pull/4201#issuecomment-1295528563

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 55s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  0s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  43m 27s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 49s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 35s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 28s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 51s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 16s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 53s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 51s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 34s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 31s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 49s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 43s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 257m 50s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/6/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  5s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 380m 39s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestObserverNode |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4201 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | Linux 8a5cac2b57fc 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / cb9ac689e0d52cd42c8dfc5d1ba18583927e0ac2 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/6/testReport/ |
   | 

[jira] [Commented] (HDFS-15654) TestBPOfferService#testMissBlocksWhenReregister fails intermittently

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625862#comment-17625862
 ] 

ASF GitHub Bot commented on HDFS-15654:
---

xinglin opened a new pull request, #5089:
URL: https://github.com/apache/hadoop/pull/5089

   
   
   ### Description of PR
   
   Cherry-pick two PRs from trunk to branch-3.3 for fixing 
TestBPOfferService#testMissBlocksWhenReregister
   
   HDFS-15654: minor conflict due to `testCommandProcessingThreadExit` being 
backported to branch-3.3 before HDFS-15654.
   HDFS-15674: clean cherry-pick. 
   
   ### How was this patch tested?
   
   Run tests three times without error.
   
   ` mvn test -Dtest="TestBPOfferService"`
   
   
   




> TestBPOfferService#testMissBlocksWhenReregister fails intermittently
> 
>
> Key: HDFS-15654
> URL: https://issues.apache.org/jira/browse/HDFS-15654
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: datanode
>Reporter: Ahmed Hussein
>Assignee: Ahmed Hussein
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> {{TestBPOfferService.testMissBlocksWhenReregister}}  is flaky. It fails 
> randomly when the 
> following expression is not true:
> {code:java}
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> {code}
> There is a race condition here that relies once more on "time" to synchronize 
> between concurrent threads. The code below is is causing the 
> non-deterministic execution.
> On a slow server, {{addNewBlockThread}} may not be done by the time the main 
> thread reach the assertion call.
> {code:java}
>   // Verify FBR/IBR count is equal to generate number.
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> } finally {
>   addNewBlockThread.join();
>   bpos.stop();
>   bpos.join();
> {code}
> Therefore, the correct implementation should wait for the thread to finish
> {code:java}
>  // the thread finished execution.
>  addNewBlockThread.join();
>   // Verify FBR/IBR count is equal to generate number.
>   assertTrue(fullBlockReportCount == totalTestBlocks ||
>   incrBlockReportCount == totalTestBlocks);
> } finally {
>   bpos.stop();
>   bpos.join();
> {code}
> {{DataNodeFaultInjector}} needs to have a longer wait_time too. 1 second is 
> not enough to satisfy the condition.
> {code:java}
>   DataNodeFaultInjector.set(new DataNodeFaultInjector() {
> public void blockUtilSendFullBlockReport() {
>   try {
> GenericTestUtils.waitFor(() -> {
>   if(count.get() > 2000) {
> return true;
>   }
>   return false;
> }, 100, 1); // increase that waiting time to 10 seconds.
>   } catch (Exception e) {
> e.printStackTrace();
>   }
> }
>   });
> {code}
> {code:bash}
> Stacktrace
> java.lang.AssertionError
>   at org.junit.Assert.fail(Assert.java:86)
>   at org.junit.Assert.assertTrue(Assert.java:41)
>   at org.junit.Assert.assertTrue(Assert.java:52)
>   at 
> org.apache.hadoop.hdfs.server.datanode.TestBPOfferService.testMissBlocksWhenReregister(TestBPOfferService.java:350)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at 

[jira] [Commented] (HDFS-16827) [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client doesn't use ObserverReadProxyProvider

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625799#comment-17625799
 ] 

ASF GitHub Bot commented on HDFS-16827:
---

hadoop-yetus commented on PR #5088:
URL: https://github.com/apache/hadoop/pull/5088#issuecomment-1295191522

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 55s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m 49s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  29m  6s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  25m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |  22m  5s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   4m 31s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   4m 20s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   3m 27s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   3m 15s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   7m 34s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 25s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 34s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  24m 45s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |  24m 45s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 59s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  21m 59s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   4m 16s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5088/1/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 1 new + 162 unchanged - 0 fixed = 163 total (was 
162)  |
   | +1 :green_heart: |  mvnsite  |   4m 12s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 17s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   3m 15s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   7m 57s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 35s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 33s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   2m 49s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  |  43m 26s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5088/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt)
 |  hadoop-hdfs-rbf in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 10s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 308m 29s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.rbfbalance.TestRouterDistCpProcedure |
   |   | hadoop.hdfs.server.federation.router.TestObserverWithRouter |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5088/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5088 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 180bba756c18 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 

[jira] [Commented] (HDFS-16785) DataNode hold BP write lock to scan disk

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625787#comment-17625787
 ] 

ASF GitHub Bot commented on HDFS-16785:
---

tomscut commented on PR #4945:
URL: https://github.com/apache/hadoop/pull/4945#issuecomment-1295157989

   > ```
   >   final FsVolumeImpl fsVolume =
   > createFsVolume(sd.getStorageUuid(), sd, location);
   > // no need to add lock
   > final ReplicaMap tempVolumeMap = new ReplicaMap();
   > ArrayList exceptions = Lists.newArrayList();
   > 
   > for (final NamespaceInfo nsInfo : nsInfos) {
   >   String bpid = nsInfo.getBlockPoolID();
   >   try (AutoCloseDataSetLock l = 
lockManager.writeLock(LockLevel.BLOCK_POOl, bpid)) {
   > fsVolume.addBlockPool(bpid, this.conf, this.timer);
   > fsVolume.getVolumeMap(bpid, tempVolumeMap, ramDiskReplicaTracker);
   >   } catch (IOException e) {
   > LOG.warn("Caught exception when adding " + fsVolume +
   > ". Will throw later.", e);
   > exceptions.add(e);
   >   }
   > }
   > ```
   > 
   > The `fsVolume` here is a local temporary variable and still not be added 
into the `volumes`, and add/remove bp operations just use the volume in 
`volumes`, so there is no conflicts. So here doesn't need the lock for 
`BlockPoolSlice`.
   > 
   > @Hexiaoqiao Sir, can check it again?
   
   I agree with @ZanderXu here. +1 from my side.




> DataNode hold BP write lock to scan disk
> 
>
> Key: HDFS-16785
> URL: https://issues.apache.org/jira/browse/HDFS-16785
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> When patching the fine-grained locking of datanode, I  found that `addVolume` 
> will hold the write block of the BP lock to scan the new volume to get the 
> blocks. If we try to add one full volume that was fixed offline before, i 
> will hold the write lock for a long time.
> The related code as bellows:
> {code:java}
> for (final NamespaceInfo nsInfo : nsInfos) {
>   String bpid = nsInfo.getBlockPoolID();
>   try (AutoCloseDataSetLock l = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> fsVolume.addBlockPool(bpid, this.conf, this.timer);
> fsVolume.getVolumeMap(bpid, tempVolumeMap, ramDiskReplicaTracker);
>   } catch (IOException e) {
> LOG.warn("Caught exception when adding " + fsVolume +
> ". Will throw later.", e);
> exceptions.add(e);
>   }
> } {code}
> And I noticed that this lock is added by HDFS-15382, means that this logic is 
> not in lock before. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16815) Error occurred in processing CacheManagerSection for xml parsing fsimage

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625741#comment-17625741
 ] 

ASF GitHub Bot commented on HDFS-16815:
---

hadoop-yetus commented on PR #5069:
URL: https://github.com/apache/hadoop/pull/5069#issuecomment-1295067235

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 53s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 24s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 42s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 34s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 43s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   4m  2s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 46s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  0s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5069/4/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 23 unchanged - 
0 fixed = 26 total (was 23)  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 27s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 30s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 10s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 348m 12s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5069/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 57s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 468m 13s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.tools.offlineImageViewer.TestOfflineImageViewer |
   |   | hadoop.hdfs.server.namenode.ha.TestObserverNode |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5069/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5069 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux fe64d9c19205 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ecaf2fed0b83dd36178f4f5978330a3e4efddf67 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 

[jira] [Commented] (HDFS-16826) [RBF SBN] ConnectionManager should advance the client stateId for every request

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625647#comment-17625647
 ] 

ASF GitHub Bot commented on HDFS-16826:
---

hadoop-yetus commented on PR #5086:
URL: https://github.com/apache/hadoop/pull/5086#issuecomment-1294910617

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 55s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 12s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 51s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   0m 46s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 40s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 50s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  5s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 51s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  27m 13s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 39s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   0m 51s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 37s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 37s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 21s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 41s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 37s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 30s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 58s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  34m 33s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 41s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 144m 17s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5086/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5086 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 2be1c664d25b 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / bc3421ded5007e81223a7d8406351a906364247c |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5086/1/testReport/ |
   | Max. process+thread count | 3810 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5086/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 

[jira] [Commented] (HDFS-16827) [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client doesn't use ObserverReadProxyProvider

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625615#comment-17625615
 ] 

ASF GitHub Bot commented on HDFS-16827:
---

ZanderXu opened a new pull request, #5088:
URL: https://github.com/apache/hadoop/pull/5088

   ### Description of PR
   
   [HDFS-16827](https://issues.apache.org/jira/browse/HDFS-16827)
   
   RouterStateIdContext shouldn't update the ResponseState if client doesn't 
use ObserverReadProxyProvider. 
   
   




> [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client 
> doesn't use ObserverReadProxyProvider
> -
>
> Key: HDFS-16827
> URL: https://issues.apache.org/jira/browse/HDFS-16827
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>
> RouterStateIdContext shouldn't update the ResponseState if client doesn't use 
> ObserverReadProxyProvider.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16827) [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client doesn't use ObserverReadProxyProvider

2022-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16827:
--
Labels: pull-request-available  (was: )

> [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client 
> doesn't use ObserverReadProxyProvider
> -
>
> Key: HDFS-16827
> URL: https://issues.apache.org/jira/browse/HDFS-16827
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RouterStateIdContext shouldn't update the ResponseState if client doesn't use 
> ObserverReadProxyProvider.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16827) [RBF SBN] RouterStateIdContext shouldn't update the ResponseState if client doesn't use ObserverReadProxyProvider

2022-10-28 Thread ZanderXu (Jira)
ZanderXu created HDFS-16827:
---

 Summary: [RBF SBN] RouterStateIdContext shouldn't update the 
ResponseState if client doesn't use ObserverReadProxyProvider
 Key: HDFS-16827
 URL: https://issues.apache.org/jira/browse/HDFS-16827
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: ZanderXu
Assignee: ZanderXu


RouterStateIdContext shouldn't update the ResponseState if client doesn't use 
ObserverReadProxyProvider.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625560#comment-17625560
 ] 

ASF GitHub Bot commented on HDFS-16689:
---

ashutoshcipher commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1294777181

   Thanks @ZanderXu for involving me. I will look into it. 




> Standby NameNode crashes when transitioning to Active with in-progress tailer
> -
>
> Key: HDFS-16689
> URL: https://issues.apache.org/jira/browse/HDFS-16689
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standby NameNode crashes when transitioning to Active with a in-progress 
> tailer. And the error message like blew:
> {code:java}
> Caused by: java.lang.IllegalStateException: Cannot start writing at txid X 
> when there is a stream available for read: ByteStringEditLog[X, Y], 
> ByteStringEditLog[X, 0]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132)
>   ... 36 more
> {code}
> After tracing and found there is a critical bug in 
> *EditlogTailer#catchupDuringFailover()* when 
> *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* 
> try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. 
> It may cannot replay any edits when they are some abnormal JournalNodes. 
> Reproduce method, suppose:
> - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode 
> is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, 
> JN1 and JN2. 
> - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully 
> synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or 
> restarted.
> - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover 
> active from NN0 to NN1. 
> - NN1 only got two responses from JN0 and JN1 when it try to selecting 
> inputStreams with *fromTxnId=3*  and *onlyDurableTxns=true*, and the count 
> txid of response is 0, 3 respectively. JN2 is abnormal, such as GC,  bad 
> network or restarted.
> - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes 
> because the *maxAllowedTxns* is 0.
> So I think Standby NameNode should *catchupDuringFailover()* with 
> *onlyDurableTxns=false* , so that it can replay all missed edits from 
> JournalNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16826) [RBF SBN] ConnectionManager should advance the client stateId for every request

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625556#comment-17625556
 ] 

ASF GitHub Bot commented on HDFS-16826:
---

ZanderXu opened a new pull request, #5086:
URL: https://github.com/apache/hadoop/pull/5086

   ### Description of PR
   [HDFS-16826](https://issues.apache.org/jira/browse/HDFS-16826)
   
   ConnectionManager should advance the client stateId for every request 
whatever `pool` is null or not.
   
   Relate code as bellow:
   ```
   if (pool == null) {
 writeLock.lock();
 try {
   pool = this.pools.get(connectionId);
   if (pool == null) {
 pool = new ConnectionPool(
 this.conf, nnAddress, ugi, this.minSize, this.maxSize,
 this.minActiveRatio, protocol,
 new PoolAlignmentContext(this.routerStateIdContext, nsId));
 this.pools.put(connectionId, pool);
 this.connectionPoolToNamespaceMap.put(connectionId, nsId);
   }
   // BUG Here
   long clientStateId = 
RouterStateIdContext.getClientStateIdFromCurrentCall(nsId);
   pool.getPoolAlignmentContext().advanceClientStateId(clientStateId);
 } finally {
   writeLock.unlock();
 }
   } 
   ```
   




> [RBF SBN] ConnectionManager should advance the client stateId for every 
> request
> ---
>
> Key: HDFS-16826
> URL: https://issues.apache.org/jira/browse/HDFS-16826
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>
> ConnectionManager should advance the client stateId for every request 
> whatever pool is null or not.
>  
> Bug Code as bellow:
> {code:java}
> // Create the pool if not created before
> if (pool == null) {
>   writeLock.lock();
>   try {
> pool = this.pools.get(connectionId);
> if (pool == null) {
>   pool = new ConnectionPool(
>   this.conf, nnAddress, ugi, this.minSize, this.maxSize,
>   this.minActiveRatio, protocol,
>   new PoolAlignmentContext(this.routerStateIdContext, nsId));
>   this.pools.put(connectionId, pool);
>   this.connectionPoolToNamespaceMap.put(connectionId, nsId);
> }
> // BUG Here
> long clientStateId = 
> RouterStateIdContext.getClientStateIdFromCurrentCall(nsId);
> pool.getPoolAlignmentContext().advanceClientStateId(clientStateId);
>   } finally {
> writeLock.unlock();
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16826) [RBF SBN] ConnectionManager should advance the client stateId for every request

2022-10-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16826:
--
Labels: pull-request-available  (was: )

> [RBF SBN] ConnectionManager should advance the client stateId for every 
> request
> ---
>
> Key: HDFS-16826
> URL: https://issues.apache.org/jira/browse/HDFS-16826
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> ConnectionManager should advance the client stateId for every request 
> whatever pool is null or not.
>  
> Bug Code as bellow:
> {code:java}
> // Create the pool if not created before
> if (pool == null) {
>   writeLock.lock();
>   try {
> pool = this.pools.get(connectionId);
> if (pool == null) {
>   pool = new ConnectionPool(
>   this.conf, nnAddress, ugi, this.minSize, this.maxSize,
>   this.minActiveRatio, protocol,
>   new PoolAlignmentContext(this.routerStateIdContext, nsId));
>   this.pools.put(connectionId, pool);
>   this.connectionPoolToNamespaceMap.put(connectionId, nsId);
> }
> // BUG Here
> long clientStateId = 
> RouterStateIdContext.getClientStateIdFromCurrentCall(nsId);
> pool.getPoolAlignmentContext().advanceClientStateId(clientStateId);
>   } finally {
> writeLock.unlock();
>   }
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16826) [RBF SBN] ConnectionManager should advance the client stateId for every request

2022-10-28 Thread ZanderXu (Jira)
ZanderXu created HDFS-16826:
---

 Summary: [RBF SBN] ConnectionManager should advance the client 
stateId for every request
 Key: HDFS-16826
 URL: https://issues.apache.org/jira/browse/HDFS-16826
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: ZanderXu
Assignee: ZanderXu


ConnectionManager should advance the client stateId for every request whatever 
pool is null or not.

 

Bug Code as bellow:
{code:java}
// Create the pool if not created before
if (pool == null) {
  writeLock.lock();
  try {
pool = this.pools.get(connectionId);
if (pool == null) {
  pool = new ConnectionPool(
  this.conf, nnAddress, ugi, this.minSize, this.maxSize,
  this.minActiveRatio, protocol,
  new PoolAlignmentContext(this.routerStateIdContext, nsId));
  this.pools.put(connectionId, pool);
  this.connectionPoolToNamespaceMap.put(connectionId, nsId);
}
// BUG Here
long clientStateId = 
RouterStateIdContext.getClientStateIdFromCurrentCall(nsId);
pool.getPoolAlignmentContext().advanceClientStateId(clientStateId);
  } finally {
writeLock.unlock();
  }
} {code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16802) Print options when accessing ClientProtocol#rename2()

2022-10-28 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu resolved HDFS-16802.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Print options when accessing ClientProtocol#rename2()
> -
>
> Key: HDFS-16802
> URL: https://issues.apache.org/jira/browse/HDFS-16802
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.4
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> When accessing ClientProtocol#rename2(), the carried options cannot be seen 
> in the log. Here is some log information:
> {code:java}
> 2022-10-13 10:21:10,727 [Listener at localhost/59732] DEBUG  hdfs.StateChange 
> (FSDirRenameOp.java:renameToInt(255)) - DIR* NameSystem.renameTo: with 
> options - /testNamenodeRetryCache/testRename2/src to 
> /testNamenodeRetryCache/testRename2/target
> {code}
> We should improve this, maybe printing options would be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16802) Print options when accessing ClientProtocol#rename2()

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625519#comment-17625519
 ] 

ASF GitHub Bot commented on HDFS-16802:
---

ZanderXu commented on PR #5013:
URL: https://github.com/apache/hadoop/pull/5013#issuecomment-1294695232

   Merged. Thanks @jianghuazhu for your contribution and thanks @jojochuang 
@tomscut for your review.




> Print options when accessing ClientProtocol#rename2()
> -
>
> Key: HDFS-16802
> URL: https://issues.apache.org/jira/browse/HDFS-16802
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.4
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
>
> When accessing ClientProtocol#rename2(), the carried options cannot be seen 
> in the log. Here is some log information:
> {code:java}
> 2022-10-13 10:21:10,727 [Listener at localhost/59732] DEBUG  hdfs.StateChange 
> (FSDirRenameOp.java:renameToInt(255)) - DIR* NameSystem.renameTo: with 
> options - /testNamenodeRetryCache/testRename2/src to 
> /testNamenodeRetryCache/testRename2/target
> {code}
> We should improve this, maybe printing options would be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16802) Print options when accessing ClientProtocol#rename2()

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625518#comment-17625518
 ] 

ASF GitHub Bot commented on HDFS-16802:
---

ZanderXu merged PR #5013:
URL: https://github.com/apache/hadoop/pull/5013




> Print options when accessing ClientProtocol#rename2()
> -
>
> Key: HDFS-16802
> URL: https://issues.apache.org/jira/browse/HDFS-16802
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.3.4
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
>
> When accessing ClientProtocol#rename2(), the carried options cannot be seen 
> in the log. Here is some log information:
> {code:java}
> 2022-10-13 10:21:10,727 [Listener at localhost/59732] DEBUG  hdfs.StateChange 
> (FSDirRenameOp.java:renameToInt(255)) - DIR* NameSystem.renameTo: with 
> options - /testNamenodeRetryCache/testRename2/src to 
> /testNamenodeRetryCache/testRename2/target
> {code}
> We should improve this, maybe printing options would be better.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16764) ObserverNamenode handles addBlock rpc and throws a FileNotFoundException

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625517#comment-17625517
 ] 

ASF GitHub Bot commented on HDFS-16764:
---

ZanderXu commented on PR #4872:
URL: https://github.com/apache/hadoop/pull/4872#issuecomment-1294687854

   @ayushtkn Sir, can help me finally review it?
   
   @tomscut @Hexiaoqiao Can help me double-review it when you are available?




> ObserverNamenode handles addBlock rpc and throws a FileNotFoundException 
> -
>
> Key: HDFS-16764
> URL: https://issues.apache.org/jira/browse/HDFS-16764
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
>
> ObserverNameNode currently can handle the addBlockLocation RPC, but it may 
> throw a FileNotFoundException when it contains stale txid.
>  * AddBlock is not a coordinated method, so Observer will not check the 
> statId.
>  * AddBlock does the validation with checkOperation(OperationCategory.READ)
> So the observer can handle the addBlock rpc. If this observer cannot replay 
> the edit of create file, it will throw a FileNotFoundException during doing 
> validation.
> The related code as follows:
> {code:java}
> checkOperation(OperationCategory.READ);
> final FSPermissionChecker pc = getPermissionChecker();
> FSPermissionChecker.setOperationType(operationName);
> readLock();
> try {
>   checkOperation(OperationCategory.READ);
>   r = FSDirWriteFileOp.validateAddBlock(this, pc, src, fileId, clientName,
> previous, onRetryBlock);
> } finally {
>   readUnlock(operationName);
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625515#comment-17625515
 ] 

ASF GitHub Bot commented on HDFS-16689:
---

ZanderXu commented on PR #4744:
URL: https://github.com/apache/hadoop/pull/4744#issuecomment-1294684900

   @xkrogen Sir, can you help me finally review it? 
   
   @ashutoshcipher @tomscut @ayushtkn @Hexiaoqiao Sir, can help me to 
double-review it when you are available?




> Standby NameNode crashes when transitioning to Active with in-progress tailer
> -
>
> Key: HDFS-16689
> URL: https://issues.apache.org/jira/browse/HDFS-16689
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standby NameNode crashes when transitioning to Active with a in-progress 
> tailer. And the error message like blew:
> {code:java}
> Caused by: java.lang.IllegalStateException: Cannot start writing at txid X 
> when there is a stream available for read: ByteStringEditLog[X, Y], 
> ByteStringEditLog[X, 0]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132)
>   ... 36 more
> {code}
> After tracing and found there is a critical bug in 
> *EditlogTailer#catchupDuringFailover()* when 
> *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* 
> try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. 
> It may cannot replay any edits when they are some abnormal JournalNodes. 
> Reproduce method, suppose:
> - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode 
> is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, 
> JN1 and JN2. 
> - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully 
> synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or 
> restarted.
> - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover 
> active from NN0 to NN1. 
> - NN1 only got two responses from JN0 and JN1 when it try to selecting 
> inputStreams with *fromTxnId=3*  and *onlyDurableTxns=true*, and the count 
> txid of response is 0, 3 respectively. JN2 is abnormal, such as GC,  bad 
> network or restarted.
> - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes 
> because the *maxAllowedTxns* is 0.
> So I think Standby NameNode should *catchupDuringFailover()* with 
> *onlyDurableTxns=false* , so that it can replay all missed edits from 
> JournalNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfered to observer state

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625495#comment-17625495
 ] 

ASF GitHub Bot commented on HDFS-16547:
---

hadoop-yetus commented on PR #4201:
URL: https://github.com/apache/hadoop/pull/4201#issuecomment-1294638050

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 43s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  40m 52s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 44s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 18s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 38s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 39s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 32s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 29s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 29s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/4/artifact/out/blanks-eol.txt)
 |  The patch has 3 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   1m  1s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 53s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 29s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 262m 36s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 10s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 375m 35s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   |   | hadoop.hdfs.server.namenode.ha.TestObserverNode |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/4/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4201 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | Linux 75ed5f9a794c 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 83874075d3da5afad1cb3720f628d057e99bd44a |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 

[jira] [Commented] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfered to observer state

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625487#comment-17625487
 ] 

ASF GitHub Bot commented on HDFS-16547:
---

hadoop-yetus commented on PR #4201:
URL: https://github.com/apache/hadoop/pull/4201#issuecomment-1294601353

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  40m 46s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 32s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 12s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 48s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 23s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 52s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 19s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 24s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 17s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 17s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/3/artifact/out/blanks-eol.txt)
 |  The patch has 3 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   0m 59s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 54s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 38s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 44s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 248m 46s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  9s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 360m 24s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestRollingUpgrade |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4201 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | Linux 550aa4b186ab 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4c4f4b24374a3c96117c86e1fb0afd02ff2927b4 |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 

[jira] [Commented] (HDFS-16547) [SBN read] Namenode in safe mode should not be transfered to observer state

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16547?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625482#comment-17625482
 ] 

ASF GitHub Bot commented on HDFS-16547:
---

hadoop-yetus commented on PR #4201:
URL: https://github.com/apache/hadoop/pull/4201#issuecomment-1294598121

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 34s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +0 :ok: |  markdownlint  |   0m  1s |  |  markdownlint was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 12s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 34s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  compile  |   1m 32s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 25s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 15s |  |  trunk passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 42s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 18s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 23s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javac  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 16s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 16s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/5/artifact/out/blanks-eol.txt)
 |  The patch has 3 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | +1 :green_heart: |  checkstyle  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 30s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  0s |  |  the patch passed with JDK 
Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 19s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 46s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 240m 54s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  7s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 350m 57s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4201 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint 
markdownlint |
   | uname | Linux 1e4aeb3eb487 4.15.0-191-generic #202-Ubuntu SMP Thu Aug 4 
01:49:29 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 83874075d3da5afad1cb3720f628d057e99bd44a |
   | Default Java | Private Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.16+8-post-Ubuntu-0ubuntu120.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_342-8u342-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4201/5/testReport/ |
   | Max. process+thread count | 3486 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs 

[jira] [Commented] (HDFS-16578) Missing blocks appeared after snn has transitioned to active state

2022-10-28 Thread Hong Chen (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625476#comment-17625476
 ] 

Hong Chen commented on HDFS-16578:
--

recently i have found some block missing file, because customer set file 
replication=1, when disk is readyonly, the file is missing. this case is not a 
problem.

> Missing blocks appeared after snn has transitioned to active state 
> ---
>
> Key: HDFS-16578
> URL: https://issues.apache.org/jira/browse/HDFS-16578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.9.2
>Reporter: Hong Chen
>Priority: Critical
>
> There is no missing blocks in NN1, after NN2 has transitioned to active state 
> by stopping zkfc of NN1, we have found some missing blocks in NN2 webui, and 
> we test the corrupted file, "hadoop fs -get 
> /user/xxx/d=2020-01-03/000154_0.lzo .", it is not readable.
> {panel:title=Exception}
> get: Could not obtain block: 
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color} 
> file=/user/xxx/d=2020-01-03/000154_0.lzo 
> {panel}
> when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is 
> healthy.
> {panel:title=Fscklog}
> /user/xxx/d=2020-01-03/000154_0.lzo 152 bytes, 1 block(s):  OK
> 0. 
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color} 
> len=152 {color:#172b4d}Live_repl=2{color} 
> DatanodeInfoWithStorage[{color:#4c9aff}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
>  
> [DatanodeInfoWithStorage[{color:#4c9aff}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
> Status: HEALTHY
>  Total size:    152 B
>  Total dirs:    0
>  Total files:    1
>  Total symlinks:        0
>  Total blocks (validated):    1 (avg. block size 152 B)
>  Minimally replicated blocks:    1 (100.0 %)
>  Over-replicated blocks:    0 (0.0 %)
>  Under-replicated blocks:    0 (0.0 %)
>  Mis-replicated blocks:        0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:    2.0
>  Corrupt blocks:        0
>  Missing replicas:        0 (0.0 %)
>  Number of data-nodes:        2400
>  Number of racks:        90
> FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
> {panel}
> then we we check the {color:#172b4d}blk_1081077638_7337053{color} in 
> datanodes log
> {panel:title=datanode0}
> 2021-11-10 16:08:10,441 [1728658213] - INFO [BP-459146894-xxx-1581848181424 
> heartbeating to NN1/xxx:8021:DataNode@2255] - 
> DatanodeRegistration({color:#4c9aff}datanode0{color}:1004, 
> datanodeUuid=8a0d2e92-1c7c-4e32-8ce2-390b524d7ced, infoPort=1006, 
> infoSecurePort=0, ipcPort=8025, 
> storageInfo=lv=-57;cid=CID-01d25bd0-acba-47c4-9273-f2457a370f8b;nsid=1756420227;c=1581848181424)
>  Starting thread to transfer 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053 to 
> {color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:08:10,468 [1728658240] - INFO 
> [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@332bbd5d:DataNode$DataTransfer@2464]
>  - DataTransfer, at {color:#4c9aff}datanode0{color}:1004: Transmitted 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053 (numBytes=152) to 
> /{color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:09:46,445 [1728754217] - INFO [BP-459146894-xxx-1581848181424 
> heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
> blk_1081077638_7337053 file 
> /mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
>  for deletion
> 2021-11-10 16:09:46,453 [1728754225] - INFO [Async disk worker #6294 for 
> volume 
> /mnt/dfs/2/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] 
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
> /mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
> {panel:title=datanode1}
> 2021-11-10 16:08:10,453 [16578] - INFO  [DataXceiver for client  at 
> /datanode0:54958 [Receiving block 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@717] - 
> Receiving BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src: 
> /{color:#4c9aff}datanode0{color}:54958 dest: 
> /{color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:08:10,480 [165787804] - INFO  [DataXceiver for client  at 
> /{color:#4c9aff}datanode0{color}:54958 [Receiving block 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@892] - 
> Received BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src: 
> /{color:#4c9aff}datanode0{color}:54958 dest: 
> /{color:#4c9aff}datanode1{color}:1004 of size 152
> 2022-05-10 12:00:42,984 [12699841344] - INFO  [BP-459146894-xxx-1581848181424 
> heartbeating to 

[jira] (HDFS-16578) Missing blocks appeared after snn has transitioned to active state

2022-10-28 Thread Hong Chen (Jira)


[ https://issues.apache.org/jira/browse/HDFS-16578 ]


Hong Chen deleted comment on HDFS-16578:
--

was (Author: chenhong):
[~linyiqun] can u help me take a look of this case, thanks a lot

> Missing blocks appeared after snn has transitioned to active state 
> ---
>
> Key: HDFS-16578
> URL: https://issues.apache.org/jira/browse/HDFS-16578
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 2.9.2
>Reporter: Hong Chen
>Priority: Critical
>
> There is no missing blocks in NN1, after NN2 has transitioned to active state 
> by stopping zkfc of NN1, we have found some missing blocks in NN2 webui, and 
> we test the corrupted file, "hadoop fs -get 
> /user/xxx/d=2020-01-03/000154_0.lzo .", it is not readable.
> {panel:title=Exception}
> get: Could not obtain block: 
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color} 
> file=/user/xxx/d=2020-01-03/000154_0.lzo 
> {panel}
> when NN1 is ANN, we fsck /user/xxx/d=2020-01-03/000154_0.lzo, but it is 
> healthy.
> {panel:title=Fscklog}
> /user/xxx/d=2020-01-03/000154_0.lzo 152 bytes, 1 block(s):  OK
> 0. 
> BP-459146894-xxx-1581848181424:{color:#172b4d}blk_1081077638_7337053{color} 
> len=152 {color:#172b4d}Live_repl=2{color} 
> DatanodeInfoWithStorage[{color:#4c9aff}datanode1{color}:1004,DS-3236bdbc-8af9-4d3a-8bc8-c921b3a8862b,DISK]],
>  
> [DatanodeInfoWithStorage[{color:#4c9aff}datanode2{color}:1004,DS-84b0a3be-5aec-4850-ba71-ed348b94e7c0,DISK]
> Status: HEALTHY
>  Total size:    152 B
>  Total dirs:    0
>  Total files:    1
>  Total symlinks:        0
>  Total blocks (validated):    1 (avg. block size 152 B)
>  Minimally replicated blocks:    1 (100.0 %)
>  Over-replicated blocks:    0 (0.0 %)
>  Under-replicated blocks:    0 (0.0 %)
>  Mis-replicated blocks:        0 (0.0 %)
>  Default replication factor:    3
>  Average block replication:    2.0
>  Corrupt blocks:        0
>  Missing replicas:        0 (0.0 %)
>  Number of data-nodes:        2400
>  Number of racks:        90
> FSCK ended at Thu May 12 17:50:37 CST 2022 in 49 milliseconds
> {panel}
> then we we check the {color:#172b4d}blk_1081077638_7337053{color} in 
> datanodes log
> {panel:title=datanode0}
> 2021-11-10 16:08:10,441 [1728658213] - INFO [BP-459146894-xxx-1581848181424 
> heartbeating to NN1/xxx:8021:DataNode@2255] - 
> DatanodeRegistration({color:#4c9aff}datanode0{color}:1004, 
> datanodeUuid=8a0d2e92-1c7c-4e32-8ce2-390b524d7ced, infoPort=1006, 
> infoSecurePort=0, ipcPort=8025, 
> storageInfo=lv=-57;cid=CID-01d25bd0-acba-47c4-9273-f2457a370f8b;nsid=1756420227;c=1581848181424)
>  Starting thread to transfer 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053 to 
> {color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:08:10,468 [1728658240] - INFO 
> [org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer@332bbd5d:DataNode$DataTransfer@2464]
>  - DataTransfer, at {color:#4c9aff}datanode0{color}:1004: Transmitted 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053 (numBytes=152) to 
> /{color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:09:46,445 [1728754217] - INFO [BP-459146894-xxx-1581848181424 
> heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
> blk_1081077638_7337053 file 
> /mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
>  for deletion
> 2021-11-10 16:09:46,453 [1728754225] - INFO [Async disk worker #6294 for 
> volume 
> /mnt/dfs/2/data/current:FsDatasetAsyncDiskService$ReplicaFileDeleteTask@321] 
> - Deleted BP-459146894-xxx-1581848181424 blk_1081077638_7337053 file 
> /mnt/dfs/2/data/current/BP-459146894-xxx-1581848181424/current/finalized/subdir15/subdir15/blk_1081077638
> {panel}
> {panel:title=datanode1}
> 2021-11-10 16:08:10,453 [16578] - INFO  [DataXceiver for client  at 
> /datanode0:54958 [Receiving block 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@717] - 
> Receiving BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src: 
> /{color:#4c9aff}datanode0{color}:54958 dest: 
> /{color:#4c9aff}datanode1{color}:1004
> 2021-11-10 16:08:10,480 [165787804] - INFO  [DataXceiver for client  at 
> /{color:#4c9aff}datanode0{color}:54958 [Receiving block 
> BP-459146894-xxx-1581848181424:blk_1081077638_7337053]:DataXceiver@892] - 
> Received BP-459146894-xxx-1581848181424:blk_1081077638_7337053 src: 
> /{color:#4c9aff}datanode0{color}:54958 dest: 
> /{color:#4c9aff}datanode1{color}:1004 of size 152
> 2022-05-10 12:00:42,984 [12699841344] - INFO  [BP-459146894-xxx-1581848181424 
> heartbeating to NN1/xxx:8021:FsDatasetAsyncDiskService@217] - Scheduling 
> {color:#172b4d}blk_1081077638_7337053{color} file 
> 

[jira] [Commented] (HDFS-16804) AddVolume contains a race condition with shutdown block pool

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625460#comment-17625460
 ] 

ASF GitHub Bot commented on HDFS-16804:
---

ZanderXu commented on PR #5033:
URL: https://github.com/apache/hadoop/pull/5033#issuecomment-1294518371

   @Hexiaoqiao @tomscut @MingXiangLi Sir, can help me review this PR?
   ReplicaMap may contains some blocks belonging to a closed blockPool.




> AddVolume contains a race condition with shutdown block pool
> 
>
> Key: HDFS-16804
> URL: https://issues.apache.org/jira/browse/HDFS-16804
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> Add Volume contains a race condition with shutdown block pool, causing the 
> ReplicaMap still contains some blocks belong to the removed block pool.
> And the new volume still contains one unused BlockPoolSlice belongs to the 
> removed block pool, caused some problems, such as: incorrect dfsUsed, 
> incorrect numBlocks of the volume.
> Let's review the logic of addVolume and shutdownBlockPool respectively.
>  
> AddVolume Logic:
>  * Step1: Get all namespaceInfo from blockPoolManager
>  * Step2: Create one temporary FsVolumeImpl object
>  * Step3: Create some blockPoolSlice according to the namespaceInfo and add 
> them to the temporary FsVolumeImpl object
>  * Step4: Scan all blocks of the namespaceInfo from the volume and store them 
> by one temporary ReplicaMap
>  * Step5: Active the temporary FsVolumeImpl which created before (with 
> FsDatasetImpl synchronized lock)
>  ** Step5.1: Merge all blocks of the temporary ReplicaMap to the global 
> ReplicaMap
>  ** Step5.2: Add the FsVolumeImpl to the volumes
> ShutdownBlockPool Logic:(with blockPool write lock)
>  * Step1: Cleanup the blockPool from the global ReplicaMap
>  * Step2: Shutdown the block pool from all the volumes
>  ** Step2.1: do some clean operations for the block pool, such as 
> saveReplica, saveDfsUsed, etc
>  ** Step2.2: remove the blockPool from bpSlices
> The race condition can be reproduced by the following steps:
>  * AddVolume Step1: Get all namespaceInfo from blockPoolManager
>  * ShutdownBlockPool Step1: Cleanup the blockPool from the global ReplicaMap
>  * ShutdownBlockPool Step2: Shutdown the block pool from all the volumes
>  * AddVolume Step 2~5
> And result:
>  * The global replicaMap contains some blocks belong to the removed blockPool
>  * The bpSlices of the FsVolumeImpl contains one blockPoolSlice belong to the 
> removed blockPool
> Expected result:
>  * The global replicaMap shouldn't contain any blocks belong to the removed 
> blockPool
>  * The bpSlices of any FsVolumeImpl shouldn't contain any blockPoolSlice 
> belong to the removed blockPool



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16785) DataNode hold BP write lock to scan disk

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625456#comment-17625456
 ] 

ASF GitHub Bot commented on HDFS-16785:
---

ZanderXu commented on PR #4945:
URL: https://github.com/apache/hadoop/pull/4945#issuecomment-1294516173

   @Hexiaoqiao @tomscut @haiyang1987 @MingXiangLi Sir, can you help me review 
this PR? 
   It will block the DataNode for a long time when we dynamically add one disk 
with many blocks.




> DataNode hold BP write lock to scan disk
> 
>
> Key: HDFS-16785
> URL: https://issues.apache.org/jira/browse/HDFS-16785
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> When patching the fine-grained locking of datanode, I  found that `addVolume` 
> will hold the write block of the BP lock to scan the new volume to get the 
> blocks. If we try to add one full volume that was fixed offline before, i 
> will hold the write lock for a long time.
> The related code as bellows:
> {code:java}
> for (final NamespaceInfo nsInfo : nsInfos) {
>   String bpid = nsInfo.getBlockPoolID();
>   try (AutoCloseDataSetLock l = lockManager.writeLock(LockLevel.BLOCK_POOl, 
> bpid)) {
> fsVolume.addBlockPool(bpid, this.conf, this.timer);
> fsVolume.getVolumeMap(bpid, tempVolumeMap, ramDiskReplicaTracker);
>   } catch (IOException e) {
> LOG.warn("Caught exception when adding " + fsVolume +
> ". Will throw later.", e);
> exceptions.add(e);
>   }
> } {code}
> And I noticed that this lock is added by HDFS-15382, means that this logic is 
> not in lock before. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16819) Remove the redundant write lock in FsDatasetImpl#createTemporary

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625454#comment-17625454
 ] 

ASF GitHub Bot commented on HDFS-16819:
---

ZanderXu commented on PR #5074:
URL: https://github.com/apache/hadoop/pull/5074#issuecomment-1294512682

   I discussed with @haiyang1987 yesterday, this PR doesn't make sense. There 
are some bugs in the `createTemporary` method.
   
   @haiyang1987 Will you modify this PR to describe and fix them?




>  Remove the redundant write lock in FsDatasetImpl#createTemporary 
> --
>
> Key: HDFS-16819
> URL: https://issues.apache.org/jira/browse/HDFS-16819
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
>  In FsDatasetImpl#createTemporary Line_1840 the writeLock here seems useless. 
> The readLock is already held in volumeMap.get().  From the code logic point 
> of view, the writeLock here maybe to remove
> {code:java}
> public ReplicaHandler createTemporary(StorageType storageType,
> String storageId, ExtendedBlock b, boolean isTransfer)
> throws IOException {
>   long startTimeMs = Time.monotonicNow();
>   long writerStopTimeoutMs = datanode.getDnConf().getXceiverStopTimeout();
>   ReplicaInfo lastFoundReplicaInfo = null;
>   boolean isInPipeline = false;
>   do {
>try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl,
>b.getBlockPoolId())) { //the writeLock here maybe to remove
>  ReplicaInfo currentReplicaInfo =
>  volumeMap.get(b.getBlockPoolId(), b.getBlockId());
>  if (currentReplicaInfo == lastFoundReplicaInfo) {
>break;
>  } else {
>isInPipeline = currentReplicaInfo.getState() == ReplicaState.TEMPORARY
>|| currentReplicaInfo.getState() == ReplicaState.RBW;
>/*
> * If the current block is not PROVIDED and old, reject.
> * else If transfer request, then accept it.
> * else if state is not RBW/Temporary, then reject
> * If current block is PROVIDED, ignore the replica.
> */
>if (((currentReplicaInfo.getGenerationStamp() >= b
>.getGenerationStamp()) || (!isTransfer && !isInPipeline))
>&& !isReplicaProvided(currentReplicaInfo)) {
>  throw new ReplicaAlreadyExistsException("Block " + b
>  + " already exists in state " + currentReplicaInfo.getState()
>  + " and thus cannot be created.");
>}
>lastFoundReplicaInfo = currentReplicaInfo;
>  }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16819) Remove the redundant write lock in FsDatasetImpl#createTemporary

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625451#comment-17625451
 ] 

ASF GitHub Bot commented on HDFS-16819:
---

tomscut commented on PR #5074:
URL: https://github.com/apache/hadoop/pull/5074#issuecomment-1294505590

   Hi @Hexiaoqiao , could you please also take a look? Thanks.




>  Remove the redundant write lock in FsDatasetImpl#createTemporary 
> --
>
> Key: HDFS-16819
> URL: https://issues.apache.org/jira/browse/HDFS-16819
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
>  In FsDatasetImpl#createTemporary Line_1840 the writeLock here seems useless. 
> The readLock is already held in volumeMap.get().  From the code logic point 
> of view, the writeLock here maybe to remove
> {code:java}
> public ReplicaHandler createTemporary(StorageType storageType,
> String storageId, ExtendedBlock b, boolean isTransfer)
> throws IOException {
>   long startTimeMs = Time.monotonicNow();
>   long writerStopTimeoutMs = datanode.getDnConf().getXceiverStopTimeout();
>   ReplicaInfo lastFoundReplicaInfo = null;
>   boolean isInPipeline = false;
>   do {
>try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl,
>b.getBlockPoolId())) { //the writeLock here maybe to remove
>  ReplicaInfo currentReplicaInfo =
>  volumeMap.get(b.getBlockPoolId(), b.getBlockId());
>  if (currentReplicaInfo == lastFoundReplicaInfo) {
>break;
>  } else {
>isInPipeline = currentReplicaInfo.getState() == ReplicaState.TEMPORARY
>|| currentReplicaInfo.getState() == ReplicaState.RBW;
>/*
> * If the current block is not PROVIDED and old, reject.
> * else If transfer request, then accept it.
> * else if state is not RBW/Temporary, then reject
> * If current block is PROVIDED, ignore the replica.
> */
>if (((currentReplicaInfo.getGenerationStamp() >= b
>.getGenerationStamp()) || (!isTransfer && !isInPipeline))
>&& !isReplicaProvided(currentReplicaInfo)) {
>  throw new ReplicaAlreadyExistsException("Block " + b
>  + " already exists in state " + currentReplicaInfo.getState()
>  + " and thus cannot be created.");
>}
>lastFoundReplicaInfo = currentReplicaInfo;
>  }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16819) Remove the redundant write lock in FsDatasetImpl#createTemporary

2022-10-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17625449#comment-17625449
 ] 

ASF GitHub Bot commented on HDFS-16819:
---

tomscut commented on code in PR #5074:
URL: https://github.com/apache/hadoop/pull/5074#discussion_r1007672138


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsDatasetImpl.java:
##
@@ -1912,9 +1907,8 @@ public ReplicaHandler createTemporary(StorageType 
storageType,
   return new ReplicaHandler(newReplicaInfo, ref);
 } finally {
   if (dataNodeMetrics != null) {
-// Create temporary operation hold write lock twice.
-long createTemporaryOpMs = Time.monotonicNow() - startHoldLockTimeMs
-+ holdLockTimeMs;
+// Create temporary operation hold write lock once.

Review Comment:
   This comment can also be removed. The other changes look good to me.
   





>  Remove the redundant write lock in FsDatasetImpl#createTemporary 
> --
>
> Key: HDFS-16819
> URL: https://issues.apache.org/jira/browse/HDFS-16819
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
>  In FsDatasetImpl#createTemporary Line_1840 the writeLock here seems useless. 
> The readLock is already held in volumeMap.get().  From the code logic point 
> of view, the writeLock here maybe to remove
> {code:java}
> public ReplicaHandler createTemporary(StorageType storageType,
> String storageId, ExtendedBlock b, boolean isTransfer)
> throws IOException {
>   long startTimeMs = Time.monotonicNow();
>   long writerStopTimeoutMs = datanode.getDnConf().getXceiverStopTimeout();
>   ReplicaInfo lastFoundReplicaInfo = null;
>   boolean isInPipeline = false;
>   do {
>try (AutoCloseableLock lock = lockManager.writeLock(LockLevel.BLOCK_POOl,
>b.getBlockPoolId())) { //the writeLock here maybe to remove
>  ReplicaInfo currentReplicaInfo =
>  volumeMap.get(b.getBlockPoolId(), b.getBlockId());
>  if (currentReplicaInfo == lastFoundReplicaInfo) {
>break;
>  } else {
>isInPipeline = currentReplicaInfo.getState() == ReplicaState.TEMPORARY
>|| currentReplicaInfo.getState() == ReplicaState.RBW;
>/*
> * If the current block is not PROVIDED and old, reject.
> * else If transfer request, then accept it.
> * else if state is not RBW/Temporary, then reject
> * If current block is PROVIDED, ignore the replica.
> */
>if (((currentReplicaInfo.getGenerationStamp() >= b
>.getGenerationStamp()) || (!isTransfer && !isInPipeline))
>&& !isReplicaProvided(currentReplicaInfo)) {
>  throw new ReplicaAlreadyExistsException("Block " + b
>  + " already exists in state " + currentReplicaInfo.getState()
>  + " and thus cannot be created.");
>}
>lastFoundReplicaInfo = currentReplicaInfo;
>  }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org