date:20220619

[jira] [Resolved] (HDFS-16635) Fix javadoc error in Java 11

2022-06-19 Thread Prabhu Joseph (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph resolved HDFS-16635.
--
Fix Version/s: 3.4.0
   Resolution: Fixed

Thanks [~aajisaka]  for reporting the issue and [~groot]  for the patch.

> Fix javadoc error in Java 11
> 
>
> Key: HDFS-16635
> URL: https://issues.apache.org/jira/browse/HDFS-16635
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, documentation
>Reporter: Akira Ajisaka
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Javadoc build in Java 11 fails.
> {noformat}
> [ERROR] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4410/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/package-info.java:20:
>  error: reference not found
> [ERROR]  * This package provides a mechanism for tracking {@link NameNode} 
> startup
> {noformat}
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16635) Fix javadoc error in Java 11

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16635?focusedWorklogId=782813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782813
 ]

ASF GitHub Bot logged work on HDFS-16635:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 05:44
Start Date: 20/Jun/22 05:44
Worklog Time Spent: 10m 
  Work Description: PrabhuJoseph merged PR #4451:
URL: https://github.com/apache/hadoop/pull/4451




Issue Time Tracking
---

Worklog Id: (was: 782813)
Time Spent: 1h  (was: 50m)

> Fix javadoc error in Java 11
> 
>
> Key: HDFS-16635
> URL: https://issues.apache.org/jira/browse/HDFS-16635
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, documentation
>Reporter: Akira Ajisaka
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Javadoc build in Java 11 fails.
> {noformat}
> [ERROR] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4410/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/package-info.java:20:
>  error: reference not found
> [ERROR]  * This package provides a mechanism for tracking {@link NameNode} 
> startup
> {noformat}
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16635) Fix javadoc error in Java 11

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16635?focusedWorklogId=782812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782812
 ]

ASF GitHub Bot logged work on HDFS-16635:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 05:44
Start Date: 20/Jun/22 05:44
Worklog Time Spent: 10m 
  Work Description: PrabhuJoseph commented on PR #4451:
URL: https://github.com/apache/hadoop/pull/4451#issuecomment-1159998248

   Thanks @ashutoshcipher  for the patch. Looks good to me.




Issue Time Tracking
---

Worklog Id: (was: 782812)
Time Spent: 50m  (was: 40m)

> Fix javadoc error in Java 11
> 
>
> Key: HDFS-16635
> URL: https://issues.apache.org/jira/browse/HDFS-16635
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: build, documentation
>Reporter: Akira Ajisaka
>Assignee: Ashutosh Gupta
>Priority: Major
>  Labels: newbie, pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Javadoc build in Java 11 fails.
> {noformat}
> [ERROR] 
> /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4410/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/package-info.java:20:
>  error: reference not found
> [ERROR]  * This package provides a mechanism for tracking {@link NameNode} 
> startup
> {noformat}
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16637) TestHDFSCLI#testAll consistently failing

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16637?focusedWorklogId=782803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782803
 ]

ASF GitHub Bot logged work on HDFS-16637:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 05:26
Start Date: 20/Jun/22 05:26
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4466:
URL: https://github.com/apache/hadoop/pull/4466#issuecomment-115999

   Thanks @virajjasani for your work.
   LGTM.
   @jianghuazhu @tomscut This bug is introduced by HDFS-16581, please help to 
review this PR, thanks.




Issue Time Tracking
---

Worklog Id: (was: 782803)
Time Spent: 0.5h  (was: 20m)

> TestHDFSCLI#testAll consistently failing
> 
>
> Key: HDFS-16637
> URL: https://issues.apache.org/jira/browse/HDFS-16637
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The failure seems to have been caused by output change introduced by 
> HDFS-16581.
> {code:java}
> 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(146)) - Detailed results:
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(147)) - 
> --2022-06-19 15:41:16,184 [Listener at 
> localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(156)) - 
> ---
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(157)) -                     Test ID: [629]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(158)) -            Test Description: 
> [printTopology: verifying that the topology map is what we expect]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(159)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(163)) -               Test Commands: [-fs 
> hdfs://localhost:51486 -printTopology]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(167)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(174)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(178)) -                  Comparator: 
> [RegexpAcrossOutputComparator]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(180)) -          Comparision result:   
> [fail]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(182)) -             Expected output:   
> [^Rack: 
> \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)]
> 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(184)) -               Actual output:   
> [Rack: /rack1
>    127.0.0.1:51487 (localhost) In Service
>    127.0.0.1:51491 (localhost) In ServiceRack: /rack2
>    127.0.0.1:51500 (localhost) In Service
>    127.0.0.1:51496 (localhost) In Service
>    127.0.0.1:51504 (localhost) In ServiceRack: /rack3
>    127.0.0.1:51508 (localhost) In ServiceRack: /rack4
>    127.0.0.1:51512 (localhost) In Service
>    127.0.0.1:51516 (localhost) In Service]
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782783
 ]

ASF GitHub Bot logged work on HDFS-16634:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 04:51
Start Date: 20/Jun/22 04:51
Worklog Time Spent: 10m 
  Work Description: virajjasani opened a new pull request, #4467:
URL: https://github.com/apache/hadoop/pull/4467

   branch-3.3 backport PR of #4448 




Issue Time Tracking
---

Worklog Id: (was: 782783)
Time Spent: 2h  (was: 1h 50m)

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782784
 ]

ASF GitHub Bot logged work on HDFS-16634:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 04:51
Start Date: 20/Jun/22 04:51
Worklog Time Spent: 10m 
  Work Description: virajjasani commented on PR #4467:
URL: https://github.com/apache/hadoop/pull/4467#issuecomment-1159969878

   FYI @tomscut 




Issue Time Tracking
---

Worklog Id: (was: 782784)
Time Spent: 2h 10m  (was: 2h)

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16637) TestHDFSCLI#testAll consistently failing

2022-06-19 Thread Viraj Jasani (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556187#comment-17556187
 ] 

Viraj Jasani commented on HDFS-16637:
-

No worries at all [~jianghuazhu], this is not carelessness at all, it happens 
with everyone :)

> TestHDFSCLI#testAll consistently failing
> 
>
> Key: HDFS-16637
> URL: https://issues.apache.org/jira/browse/HDFS-16637
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The failure seems to have been caused by output change introduced by 
> HDFS-16581.
> {code:java}
> 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(146)) - Detailed results:
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(147)) - 
> --2022-06-19 15:41:16,184 [Listener at 
> localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(156)) - 
> ---
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(157)) -                     Test ID: [629]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(158)) -            Test Description: 
> [printTopology: verifying that the topology map is what we expect]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(159)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(163)) -               Test Commands: [-fs 
> hdfs://localhost:51486 -printTopology]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(167)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(174)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(178)) -                  Comparator: 
> [RegexpAcrossOutputComparator]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(180)) -          Comparision result:   
> [fail]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(182)) -             Expected output:   
> [^Rack: 
> \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)]
> 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(184)) -               Actual output:   
> [Rack: /rack1
>    127.0.0.1:51487 (localhost) In Service
>    127.0.0.1:51491 (localhost) In ServiceRack: /rack2
>    127.0.0.1:51500 (localhost) In Service
>    127.0.0.1:51496 (localhost) In Service
>    127.0.0.1:51504 (localhost) In ServiceRack: /rack3
>    127.0.0.1:51508 (localhost) In ServiceRack: /rack4
>    127.0.0.1:51512 (localhost) In Service
>    127.0.0.1:51516 (localhost) In Service]
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-16637) TestHDFSCLI#testAll consistently failing

2022-06-19 Thread JiangHua Zhu (Jira)



[ 
https://issues.apache.org/jira/browse/HDFS-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556183#comment-17556183
 ] 

JiangHua Zhu commented on HDFS-16637:
-

Thanks to [~vjasani] for finding this question.
I think it was due to my carelessness.
I'm very sorry.

> TestHDFSCLI#testAll consistently failing
> 
>
> Key: HDFS-16637
> URL: https://issues.apache.org/jira/browse/HDFS-16637
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The failure seems to have been caused by output change introduced by 
> HDFS-16581.
> {code:java}
> 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(146)) - Detailed results:
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(147)) - 
> --2022-06-19 15:41:16,184 [Listener at 
> localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(156)) - 
> ---
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(157)) -                     Test ID: [629]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(158)) -            Test Description: 
> [printTopology: verifying that the topology map is what we expect]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(159)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(163)) -               Test Commands: [-fs 
> hdfs://localhost:51486 -printTopology]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(167)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(174)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(178)) -                  Comparator: 
> [RegexpAcrossOutputComparator]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(180)) -          Comparision result:   
> [fail]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(182)) -             Expected output:   
> [^Rack: 
> \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)]
> 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(184)) -               Actual output:   
> [Rack: /rack1
>    127.0.0.1:51487 (localhost) In Service
>    127.0.0.1:51491 (localhost) In ServiceRack: /rack2
>    127.0.0.1:51500 (localhost) In Service
>    127.0.0.1:51496 (localhost) In Service
>    127.0.0.1:51504 (localhost) In ServiceRack: /rack3
>    127.0.0.1:51508 (localhost) In ServiceRack: /rack4
>    127.0.0.1:51512 (localhost) In Service
>    127.0.0.1:51516 (localhost) In Service]
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2022-06-19 Thread Akira Ajisaka (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-16064:
-
Fix Version/s: 3.2.4

Cherry-picked to branch-3.2.

> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.2.4, 3.3.4
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

2022-06-19 Thread Akira Ajisaka (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka updated HDFS-16064:
-
Summary: Determine when to invalidate corrupt replicas based on number of 
usable replicas  (was: HDFS-721 causes DataNode decommissioning to get stuck 
indefinitely)

> Determine when to invalidate corrupt replicas based on number of usable 
> replicas
> 
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-06-19 Thread Akira Ajisaka (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka resolved HDFS-16064.
--
Fix Version/s: 3.4.0
   3.3.4
   Resolution: Fixed

Merged the PR into trunk and branch-3.3.

> HDFS-721 causes DataNode decommissioning to get stuck indefinitely
> --
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Assigned] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-06-19 Thread Akira Ajisaka (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akira Ajisaka reassigned HDFS-16064:


Assignee: Kevin Wikant

> HDFS-721 causes DataNode decommissioning to get stuck indefinitely
> --
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Assignee: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=782766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782766
 ]

ASF GitHub Bot logged work on HDFS-16064:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 02:20
Start Date: 20/Jun/22 02:20
Worklog Time Spent: 10m 
  Work Description: aajisaka commented on PR #4410:
URL: https://github.com/apache/hadoop/pull/4410#issuecomment-1159895754

   Merged. Thank you @KevinWikant for your contribution and thank you 
@ashutoshcipher @ZanderXu for your review!




Issue Time Tracking
---

Worklog Id: (was: 782766)
Time Spent: 2h  (was: 1h 50m)

> HDFS-721 causes DataNode decommissioning to get stuck indefinitely
> --
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This

[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=782765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782765
 ]

ASF GitHub Bot logged work on HDFS-16064:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 02:20
Start Date: 20/Jun/22 02:20
Worklog Time Spent: 10m 
  Work Description: aajisaka merged PR #4410:
URL: https://github.com/apache/hadoop/pull/4410




Issue Time Tracking
---

Worklog Id: (was: 782765)
Time Spent: 1h 50m  (was: 1h 40m)

> HDFS-721 causes DataNode decommissioning to get stuck indefinitely
> --
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To

[jira] [Resolved] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2022-06-19 Thread Tao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li resolved HDFS-16634.
---
Fix Version/s: 3.4.0
   Resolution: Resolved

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782752
 ]

ASF GitHub Bot logged work on HDFS-16634:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 01:25
Start Date: 20/Jun/22 01:25
Worklog Time Spent: 10m 
  Work Description: tomscut commented on PR #4448:
URL: https://github.com/apache/hadoop/pull/4448#issuecomment-1159867414

   Hi @virajjasani , could you please submit another PR for branch-3.3 since 
there are some conflicts when cherry-pick. Thanks.




Issue Time Tracking
---

Worklog Id: (was: 782752)
Time Spent: 1h 50m  (was: 1h 40m)

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782751
 ]

ASF GitHub Bot logged work on HDFS-16634:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 01:21
Start Date: 20/Jun/22 01:21
Worklog Time Spent: 10m 
  Work Description: tomscut merged PR #4448:
URL: https://github.com/apache/hadoop/pull/4448




Issue Time Tracking
---

Worklog Id: (was: 782751)
Time Spent: 1h 40m  (was: 1.5h)

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782750
 ]

ASF GitHub Bot logged work on HDFS-16634:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 01:20
Start Date: 20/Jun/22 01:20
Worklog Time Spent: 10m 
  Work Description: tomscut commented on PR #4448:
URL: https://github.com/apache/hadoop/pull/4448#issuecomment-1159864599

   Thanks @virajjasani for your contribution! 




Issue Time Tracking
---

Worklog Id: (was: 782750)
Time Spent: 1.5h  (was: 1h 20m)

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16637) TestHDFSCLI#testAll consistently failing

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16637?focusedWorklogId=782747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782747
 ]

ASF GitHub Bot logged work on HDFS-16637:
-

Author: ASF GitHub Bot
Created on: 20/Jun/22 00:15
Start Date: 20/Jun/22 00:15
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4466:
URL: https://github.com/apache/hadoop/pull/4466#issuecomment-1159840541

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  37m 37s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  56m 28s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 13s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  shadedclient  |  18m 33s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   1m 16s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 40s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   |  81m 30s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4466/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4466 |
   | Optional Tests | dupname asflicense unit codespell detsecrets xmllint |
   | uname | Linux 8665b0c30282 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 
10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / faf0292f0ca935dceb4a4598909d0c4de919b3f0 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4466/1/testReport/ |
   | Max. process+thread count | 706 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4466/1/console |
   | versions | git=2.25.1 maven=3.6.3 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




Issue Time Tracking
---

Worklog Id: (was: 782747)
Time Spent: 20m  (was: 10m)

> TestHDFSCLI#testAll consistently failing
> 
>
> Key: HDFS-16637
> URL: https://issues.apache.org/jira/browse/HDFS-16637
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The failure seems to have been caused by output change introduced by 
> HDFS-16581.
> {code:java}
> 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(146)) - Detailed results:
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(147)) - 
> --2022-06-19 15:41:16,184 [Listener at 
> localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(156)) - 
> ---
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(157)) -                     Test ID: [629]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(158)) -            Test Description: 
> [printTopology: verifying that the topology map is what we expect]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(159)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519]

[jira] [Work logged] (HDFS-16637) TestHDFSCLI#testAll consistently failing

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16637?focusedWorklogId=782742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782742
 ]

ASF GitHub Bot logged work on HDFS-16637:
-

Author: ASF GitHub Bot
Created on: 19/Jun/22 22:52
Start Date: 19/Jun/22 22:52
Worklog Time Spent: 10m 
  Work Description: virajjasani opened a new pull request, #4466:
URL: https://github.com/apache/hadoop/pull/4466

   ### Description of PR
   ```
   2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(146)) - Detailed results:
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(147)) - 
--
   
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(156)) - 
---
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(157)) -
 Test ID: [629]
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(158)) -Test 
Description: [printTopology: verifying that the topology map is what we expect]
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(159)) - 
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(163)) -   Test 
Commands: [-fs hdfs://localhost:51486 -printTopology]
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(167)) - 
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(174)) - 
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(178)) -  
Comparator: [RegexpAcrossOutputComparator]
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(180)) -  
Comparision result:   [fail]
   2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(182)) - 
Expected output:   [^Rack: 
\/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)]
   2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO  
cli.CLITestHelper (CLITestHelper.java:displayResults(184)) -   
Actual output:   [Rack: /rack1
  127.0.0.1:51487 (localhost) In Service
  127.0.0.1:51491 (localhost) In Service
   
   Rack: /rack2
  127.0.0.1:51500 (localhost) In Service
  127.0.0.1:51496 (localhost) In Service
  127.0.0.1:51504 (localhost) In Service
   
   Rack: /rack3
  127.0.0.1:51508 (localhost) In Service
   
   Rack: /rack4
  127.0.0.1:51512 (localhost) In Service
  127.0.0.1:51516 (localhost) In Service
   
   ]
   ```
   
   ### How was this patch tested?
   UT
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   




Issue Time Tracking
---

Worklog Id: (was: 782742)
Remaining Estimate: 0h
Time Spent: 10m

> TestHDFSCLI#testAll consistently failing
> 
>
> Key: HDFS-16637
> URL: https://issues.apache.org/jira/browse/HDFS-16637
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The failure seems to have been caused by output change introduced by 
> HDFS-16581.
> {code:java}
> 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(146)) - Detailed results:
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(147)) - 
> --2022-06-19 15:41:16,184 [Listener at 
> localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(156)) - 
> ---
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(157)) -                     Test ID: [629]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(158)) -            Test Description: 
> [printTopology: verifying that the topology map is what we expect]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO

[jira] [Updated] (HDFS-16637) TestHDFSCLI#testAll consistently failing

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16637:
--
Labels: pull-request-available  (was: )

> TestHDFSCLI#testAll consistently failing
> 
>
> Key: HDFS-16637
> URL: https://issues.apache.org/jira/browse/HDFS-16637
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The failure seems to have been caused by output change introduced by 
> HDFS-16581.
> {code:java}
> 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(146)) - Detailed results:
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(147)) - 
> --2022-06-19 15:41:16,184 [Listener at 
> localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(156)) - 
> ---
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(157)) -                     Test ID: [629]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(158)) -            Test Description: 
> [printTopology: verifying that the topology map is what we expect]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(159)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(163)) -               Test Commands: [-fs 
> hdfs://localhost:51486 -printTopology]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(167)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(174)) - 
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(178)) -                  Comparator: 
> [RegexpAcrossOutputComparator]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(180)) -          Comparision result:   
> [fail]
> 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(182)) -             Expected output:   
> [^Rack: 
> \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)]
> 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO  cli.CLITestHelper 
> (CLITestHelper.java:displayResults(184)) -               Actual output:   
> [Rack: /rack1
>    127.0.0.1:51487 (localhost) In Service
>    127.0.0.1:51491 (localhost) In ServiceRack: /rack2
>    127.0.0.1:51500 (localhost) In Service
>    127.0.0.1:51496 (localhost) In Service
>    127.0.0.1:51504 (localhost) In ServiceRack: /rack3
>    127.0.0.1:51508 (localhost) In ServiceRack: /rack4
>    127.0.0.1:51512 (localhost) In Service
>    127.0.0.1:51516 (localhost) In Service]
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Created] (HDFS-16637) TestHDFSCLI#testAll consistently failing

2022-06-19 Thread Viraj Jasani (Jira)

Viraj Jasani created HDFS-16637:
---

 Summary: TestHDFSCLI#testAll consistently failing
 Key: HDFS-16637
 URL: https://issues.apache.org/jira/browse/HDFS-16637
 Project: Hadoop HDFS
  Issue Type: Test
Reporter: Viraj Jasani
Assignee: Viraj Jasani


The failure seems to have been caused by output change introduced by HDFS-16581.
{code:java}
2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(146)) - Detailed results:
2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(147)) - 
--2022-06-19 15:41:16,184 [Listener at 
localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(156)) - 
---
2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(157)) -                     Test ID: [629]
2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(158)) -            Test Description: 
[printTopology: verifying that the topology map is what we expect]
2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(159)) - 
2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(163)) -               Test Commands: [-fs 
hdfs://localhost:51486 -printTopology]
2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(167)) - 
2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(174)) - 
2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(178)) -                  Comparator: 
[RegexpAcrossOutputComparator]
2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(180)) -          Comparision result:   [fail]
2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(182)) -             Expected output:   
[^Rack: 
\/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)]
2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO  cli.CLITestHelper 
(CLITestHelper.java:displayResults(184)) -               Actual output:   
[Rack: /rack1
   127.0.0.1:51487 (localhost) In Service
   127.0.0.1:51491 (localhost) In ServiceRack: /rack2
   127.0.0.1:51500 (localhost) In Service
   127.0.0.1:51496 (localhost) In Service
   127.0.0.1:51504 (localhost) In ServiceRack: /rack3
   127.0.0.1:51508 (localhost) In ServiceRack: /rack4
   127.0.0.1:51512 (localhost) In Service
   127.0.0.1:51516 (localhost) In Service]
 {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-06-19 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=782721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782721
 ]

ASF GitHub Bot logged work on HDFS-16064:
-

Author: ASF GitHub Bot
Created on: 19/Jun/22 16:07
Start Date: 19/Jun/22 16:07
Worklog Time Spent: 10m 
  Work Description: ashutoshcipher commented on PR #4410:
URL: https://github.com/apache/hadoop/pull/4410#issuecomment-1159766297

   > Filed [HDFS-16635](https://issues.apache.org/jira/browse/HDFS-16635) to 
fix javadoc error.
   
   @aajisaka  Raised PR for the same.




Issue Time Tracking
---

Worklog Id: (was: 782721)
Time Spent: 1h 40m  (was: 1.5h)

> HDFS-721 causes DataNode decommissioning to get stuck indefinitely
> --
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the

[jira] (HDFS-2546) The C HDFS API should work with secure HDFS

2022-06-19 Thread Ambar Hegde (Jira)



[ https://issues.apache.org/jira/browse/HDFS-2546 ]


Ambar Hegde deleted comment on HDFS-2546:
---

was (Author: ambar):
I am working on this.

> The C HDFS API should work with secure HDFS
> ---
>
> Key: HDFS-2546
> URL: https://issues.apache.org/jira/browse/HDFS-2546
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: libhdfs
>Affects Versions: 2.0.0-alpha
>Reporter: Harsh J
>Priority: Major
>
> Right now, the libhdfs will not work with Kerberos Hadoop. In case libhdfs is 
> still being supported, it must fully work with Kerberized instances of HDFS.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-16635) Fix javadoc error in Java 11

[jira] [Work logged] (HDFS-16635) Fix javadoc error in Java 11

[jira] [Work logged] (HDFS-16635) Fix javadoc error in Java 11

[jira] [Work logged] (HDFS-16637) TestHDFSCLI#testAll consistently failing

[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

[jira] [Commented] (HDFS-16637) TestHDFSCLI#testAll consistently failing

[jira] [Commented] (HDFS-16637) TestHDFSCLI#testAll consistently failing

[jira] [Updated] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

[jira] [Updated] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas

[jira] [Resolved] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

[jira] [Assigned] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

[jira] [Resolved] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

[jira] [Work logged] (HDFS-16637) TestHDFSCLI#testAll consistently failing

[jira] [Work logged] (HDFS-16637) TestHDFSCLI#testAll consistently failing

[jira] [Updated] (HDFS-16637) TestHDFSCLI#testAll consistently failing

[jira] [Created] (HDFS-16637) TestHDFSCLI#testAll consistently failing

[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

[jira] (HDFS-2546) The C HDFS API should work with secure HDFS

24 matches

Site Navigation

Mail list logo

Footer information