[jira] [Resolved] (HDFS-16635) Fix javadoc error in Java 11
[ https://issues.apache.org/jira/browse/HDFS-16635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph resolved HDFS-16635. -- Fix Version/s: 3.4.0 Resolution: Fixed Thanks [~aajisaka] for reporting the issue and [~groot] for the patch. > Fix javadoc error in Java 11 > > > Key: HDFS-16635 > URL: https://issues.apache.org/jira/browse/HDFS-16635 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, documentation >Reporter: Akira Ajisaka >Assignee: Ashutosh Gupta >Priority: Major > Labels: newbie, pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h > Remaining Estimate: 0h > > Javadoc build in Java 11 fails. > {noformat} > [ERROR] > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4410/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/package-info.java:20: > error: reference not found > [ERROR] * This package provides a mechanism for tracking {@link NameNode} > startup > {noformat} > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16635) Fix javadoc error in Java 11
[ https://issues.apache.org/jira/browse/HDFS-16635?focusedWorklogId=782813=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782813 ] ASF GitHub Bot logged work on HDFS-16635: - Author: ASF GitHub Bot Created on: 20/Jun/22 05:44 Start Date: 20/Jun/22 05:44 Worklog Time Spent: 10m Work Description: PrabhuJoseph merged PR #4451: URL: https://github.com/apache/hadoop/pull/4451 Issue Time Tracking --- Worklog Id: (was: 782813) Time Spent: 1h (was: 50m) > Fix javadoc error in Java 11 > > > Key: HDFS-16635 > URL: https://issues.apache.org/jira/browse/HDFS-16635 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, documentation >Reporter: Akira Ajisaka >Assignee: Ashutosh Gupta >Priority: Major > Labels: newbie, pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Javadoc build in Java 11 fails. > {noformat} > [ERROR] > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4410/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/package-info.java:20: > error: reference not found > [ERROR] * This package provides a mechanism for tracking {@link NameNode} > startup > {noformat} > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16635) Fix javadoc error in Java 11
[ https://issues.apache.org/jira/browse/HDFS-16635?focusedWorklogId=782812=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782812 ] ASF GitHub Bot logged work on HDFS-16635: - Author: ASF GitHub Bot Created on: 20/Jun/22 05:44 Start Date: 20/Jun/22 05:44 Worklog Time Spent: 10m Work Description: PrabhuJoseph commented on PR #4451: URL: https://github.com/apache/hadoop/pull/4451#issuecomment-1159998248 Thanks @ashutoshcipher for the patch. Looks good to me. Issue Time Tracking --- Worklog Id: (was: 782812) Time Spent: 50m (was: 40m) > Fix javadoc error in Java 11 > > > Key: HDFS-16635 > URL: https://issues.apache.org/jira/browse/HDFS-16635 > Project: Hadoop HDFS > Issue Type: Bug > Components: build, documentation >Reporter: Akira Ajisaka >Assignee: Ashutosh Gupta >Priority: Major > Labels: newbie, pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Javadoc build in Java 11 fails. > {noformat} > [ERROR] > /home/jenkins/jenkins-home/workspace/hadoop-multibranch_PR-4410/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/startupprogress/package-info.java:20: > error: reference not found > [ERROR] * This package provides a mechanism for tracking {@link NameNode} > startup > {noformat} > https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16637) TestHDFSCLI#testAll consistently failing
[ https://issues.apache.org/jira/browse/HDFS-16637?focusedWorklogId=782803=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782803 ] ASF GitHub Bot logged work on HDFS-16637: - Author: ASF GitHub Bot Created on: 20/Jun/22 05:26 Start Date: 20/Jun/22 05:26 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4466: URL: https://github.com/apache/hadoop/pull/4466#issuecomment-115999 Thanks @virajjasani for your work. LGTM. @jianghuazhu @tomscut This bug is introduced by HDFS-16581, please help to review this PR, thanks. Issue Time Tracking --- Worklog Id: (was: 782803) Time Spent: 0.5h (was: 20m) > TestHDFSCLI#testAll consistently failing > > > Key: HDFS-16637 > URL: https://issues.apache.org/jira/browse/HDFS-16637 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > The failure seems to have been caused by output change introduced by > HDFS-16581. > {code:java} > 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(146)) - Detailed results: > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(147)) - > --2022-06-19 15:41:16,184 [Listener at > localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(156)) - > --- > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(157)) - Test ID: [629] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(158)) - Test Description: > [printTopology: verifying that the topology map is what we expect] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(159)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(163)) - Test Commands: [-fs > hdfs://localhost:51486 -printTopology] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(167)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(174)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(178)) - Comparator: > [RegexpAcrossOutputComparator] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(180)) - Comparision result: > [fail] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(182)) - Expected output: > [^Rack: > \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)] > 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(184)) - Actual output: > [Rack: /rack1 > 127.0.0.1:51487 (localhost) In Service > 127.0.0.1:51491 (localhost) In ServiceRack: /rack2 > 127.0.0.1:51500 (localhost) In Service > 127.0.0.1:51496 (localhost) In Service > 127.0.0.1:51504 (localhost) In ServiceRack: /rack3 > 127.0.0.1:51508 (localhost) In ServiceRack: /rack4 > 127.0.0.1:51512 (localhost) In Service > 127.0.0.1:51516 (localhost) In Service] > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782783=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782783 ] ASF GitHub Bot logged work on HDFS-16634: - Author: ASF GitHub Bot Created on: 20/Jun/22 04:51 Start Date: 20/Jun/22 04:51 Worklog Time Spent: 10m Work Description: virajjasani opened a new pull request, #4467: URL: https://github.com/apache/hadoop/pull/4467 branch-3.3 backport PR of #4448 Issue Time Tracking --- Worklog Id: (was: 782783) Time Spent: 2h (was: 1h 50m) > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782784=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782784 ] ASF GitHub Bot logged work on HDFS-16634: - Author: ASF GitHub Bot Created on: 20/Jun/22 04:51 Start Date: 20/Jun/22 04:51 Worklog Time Spent: 10m Work Description: virajjasani commented on PR #4467: URL: https://github.com/apache/hadoop/pull/4467#issuecomment-1159969878 FYI @tomscut Issue Time Tracking --- Worklog Id: (was: 782784) Time Spent: 2h 10m (was: 2h) > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16637) TestHDFSCLI#testAll consistently failing
[ https://issues.apache.org/jira/browse/HDFS-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556187#comment-17556187 ] Viraj Jasani commented on HDFS-16637: - No worries at all [~jianghuazhu], this is not carelessness at all, it happens with everyone :) > TestHDFSCLI#testAll consistently failing > > > Key: HDFS-16637 > URL: https://issues.apache.org/jira/browse/HDFS-16637 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The failure seems to have been caused by output change introduced by > HDFS-16581. > {code:java} > 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(146)) - Detailed results: > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(147)) - > --2022-06-19 15:41:16,184 [Listener at > localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(156)) - > --- > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(157)) - Test ID: [629] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(158)) - Test Description: > [printTopology: verifying that the topology map is what we expect] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(159)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(163)) - Test Commands: [-fs > hdfs://localhost:51486 -printTopology] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(167)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(174)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(178)) - Comparator: > [RegexpAcrossOutputComparator] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(180)) - Comparision result: > [fail] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(182)) - Expected output: > [^Rack: > \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)] > 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(184)) - Actual output: > [Rack: /rack1 > 127.0.0.1:51487 (localhost) In Service > 127.0.0.1:51491 (localhost) In ServiceRack: /rack2 > 127.0.0.1:51500 (localhost) In Service > 127.0.0.1:51496 (localhost) In Service > 127.0.0.1:51504 (localhost) In ServiceRack: /rack3 > 127.0.0.1:51508 (localhost) In ServiceRack: /rack4 > 127.0.0.1:51512 (localhost) In Service > 127.0.0.1:51516 (localhost) In Service] > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16637) TestHDFSCLI#testAll consistently failing
[ https://issues.apache.org/jira/browse/HDFS-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17556183#comment-17556183 ] JiangHua Zhu commented on HDFS-16637: - Thanks to [~vjasani] for finding this question. I think it was due to my carelessness. I'm very sorry. > TestHDFSCLI#testAll consistently failing > > > Key: HDFS-16637 > URL: https://issues.apache.org/jira/browse/HDFS-16637 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The failure seems to have been caused by output change introduced by > HDFS-16581. > {code:java} > 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(146)) - Detailed results: > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(147)) - > --2022-06-19 15:41:16,184 [Listener at > localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(156)) - > --- > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(157)) - Test ID: [629] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(158)) - Test Description: > [printTopology: verifying that the topology map is what we expect] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(159)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(163)) - Test Commands: [-fs > hdfs://localhost:51486 -printTopology] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(167)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(174)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(178)) - Comparator: > [RegexpAcrossOutputComparator] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(180)) - Comparision result: > [fail] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(182)) - Expected output: > [^Rack: > \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)] > 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(184)) - Actual output: > [Rack: /rack1 > 127.0.0.1:51487 (localhost) In Service > 127.0.0.1:51491 (localhost) In ServiceRack: /rack2 > 127.0.0.1:51500 (localhost) In Service > 127.0.0.1:51496 (localhost) In Service > 127.0.0.1:51504 (localhost) In ServiceRack: /rack3 > 127.0.0.1:51508 (localhost) In ServiceRack: /rack4 > 127.0.0.1:51512 (localhost) In Service > 127.0.0.1:51516 (localhost) In Service] > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas
[ https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-16064: - Fix Version/s: 3.2.4 Cherry-picked to branch-3.2. > Determine when to invalidate corrupt replicas based on number of usable > replicas > > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.2.4, 3.3.4 > > Time Spent: 2h > Remaining Estimate: 0h > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the datanodes being decommissioned > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16064) Determine when to invalidate corrupt replicas based on number of usable replicas
[ https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka updated HDFS-16064: - Summary: Determine when to invalidate corrupt replicas based on number of usable replicas (was: HDFS-721 causes DataNode decommissioning to get stuck indefinitely) > Determine when to invalidate corrupt replicas based on number of usable > replicas > > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.4 > > Time Spent: 2h > Remaining Estimate: 0h > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the datanodes being decommissioned > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved HDFS-16064. -- Fix Version/s: 3.4.0 3.3.4 Resolution: Fixed Merged the PR into trunk and branch-3.3. > HDFS-721 causes DataNode decommissioning to get stuck indefinitely > -- > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.4 > > Time Spent: 2h > Remaining Estimate: 0h > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the datanodes being decommissioned > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka reassigned HDFS-16064: Assignee: Kevin Wikant > HDFS-721 causes DataNode decommissioning to get stuck indefinitely > -- > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Assignee: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the datanodes being decommissioned > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=782766=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782766 ] ASF GitHub Bot logged work on HDFS-16064: - Author: ASF GitHub Bot Created on: 20/Jun/22 02:20 Start Date: 20/Jun/22 02:20 Worklog Time Spent: 10m Work Description: aajisaka commented on PR #4410: URL: https://github.com/apache/hadoop/pull/4410#issuecomment-1159895754 Merged. Thank you @KevinWikant for your contribution and thank you @ashutoshcipher @ZanderXu for your review! Issue Time Tracking --- Worklog Id: (was: 782766) Time Spent: 2h (was: 1h 50m) > HDFS-721 causes DataNode decommissioning to get stuck indefinitely > -- > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the datanodes being decommissioned > -- This
[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=782765=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782765 ] ASF GitHub Bot logged work on HDFS-16064: - Author: ASF GitHub Bot Created on: 20/Jun/22 02:20 Start Date: 20/Jun/22 02:20 Worklog Time Spent: 10m Work Description: aajisaka merged PR #4410: URL: https://github.com/apache/hadoop/pull/4410 Issue Time Tracking --- Worklog Id: (was: 782765) Time Spent: 1h 50m (was: 1h 40m) > HDFS-721 causes DataNode decommissioning to get stuck indefinitely > -- > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the datanodes being decommissioned > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To
[jira] [Resolved] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li resolved HDFS-16634. --- Fix Version/s: 3.4.0 Resolution: Resolved > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782752=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782752 ] ASF GitHub Bot logged work on HDFS-16634: - Author: ASF GitHub Bot Created on: 20/Jun/22 01:25 Start Date: 20/Jun/22 01:25 Worklog Time Spent: 10m Work Description: tomscut commented on PR #4448: URL: https://github.com/apache/hadoop/pull/4448#issuecomment-1159867414 Hi @virajjasani , could you please submit another PR for branch-3.3 since there are some conflicts when cherry-pick. Thanks. Issue Time Tracking --- Worklog Id: (was: 782752) Time Spent: 1h 50m (was: 1h 40m) > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h 50m > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782751=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782751 ] ASF GitHub Bot logged work on HDFS-16634: - Author: ASF GitHub Bot Created on: 20/Jun/22 01:21 Start Date: 20/Jun/22 01:21 Worklog Time Spent: 10m Work Description: tomscut merged PR #4448: URL: https://github.com/apache/hadoop/pull/4448 Issue Time Tracking --- Worklog Id: (was: 782751) Time Spent: 1h 40m (was: 1.5h) > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782750=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782750 ] ASF GitHub Bot logged work on HDFS-16634: - Author: ASF GitHub Bot Created on: 20/Jun/22 01:20 Start Date: 20/Jun/22 01:20 Worklog Time Spent: 10m Work Description: tomscut commented on PR #4448: URL: https://github.com/apache/hadoop/pull/4448#issuecomment-1159864599 Thanks @virajjasani for your contribution! Issue Time Tracking --- Worklog Id: (was: 782750) Time Spent: 1.5h (was: 1h 20m) > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16637) TestHDFSCLI#testAll consistently failing
[ https://issues.apache.org/jira/browse/HDFS-16637?focusedWorklogId=782747=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782747 ] ASF GitHub Bot logged work on HDFS-16637: - Author: ASF GitHub Bot Created on: 20/Jun/22 00:15 Start Date: 20/Jun/22 00:15 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4466: URL: https://github.com/apache/hadoop/pull/4466#issuecomment-1159840541 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 37m 37s | | trunk passed | | +1 :green_heart: | shadedclient | 56m 28s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 13s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | shadedclient | 18m 33s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 1m 16s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 40s | | The patch does not generate ASF License warnings. | | | | 81m 30s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4466/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4466 | | Optional Tests | dupname asflicense unit codespell detsecrets xmllint | | uname | Linux 8665b0c30282 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / faf0292f0ca935dceb4a4598909d0c4de919b3f0 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4466/1/testReport/ | | Max. process+thread count | 706 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4466/1/console | | versions | git=2.25.1 maven=3.6.3 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. Issue Time Tracking --- Worklog Id: (was: 782747) Time Spent: 20m (was: 10m) > TestHDFSCLI#testAll consistently failing > > > Key: HDFS-16637 > URL: https://issues.apache.org/jira/browse/HDFS-16637 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The failure seems to have been caused by output change introduced by > HDFS-16581. > {code:java} > 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(146)) - Detailed results: > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(147)) - > --2022-06-19 15:41:16,184 [Listener at > localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(156)) - > --- > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(157)) - Test ID: [629] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(158)) - Test Description: > [printTopology: verifying that the topology map is what we expect] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(159)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519]
[jira] [Work logged] (HDFS-16637) TestHDFSCLI#testAll consistently failing
[ https://issues.apache.org/jira/browse/HDFS-16637?focusedWorklogId=782742=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782742 ] ASF GitHub Bot logged work on HDFS-16637: - Author: ASF GitHub Bot Created on: 19/Jun/22 22:52 Start Date: 19/Jun/22 22:52 Worklog Time Spent: 10m Work Description: virajjasani opened a new pull request, #4466: URL: https://github.com/apache/hadoop/pull/4466 ### Description of PR ``` 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(146)) - Detailed results: 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(147)) - -- 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(156)) - --- 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(157)) - Test ID: [629] 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(158)) -Test Description: [printTopology: verifying that the topology map is what we expect] 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(159)) - 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(163)) - Test Commands: [-fs hdfs://localhost:51486 -printTopology] 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(167)) - 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(174)) - 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(178)) - Comparator: [RegexpAcrossOutputComparator] 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(180)) - Comparision result: [fail] 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(182)) - Expected output: [^Rack: \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)] 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(184)) - Actual output: [Rack: /rack1 127.0.0.1:51487 (localhost) In Service 127.0.0.1:51491 (localhost) In Service Rack: /rack2 127.0.0.1:51500 (localhost) In Service 127.0.0.1:51496 (localhost) In Service 127.0.0.1:51504 (localhost) In Service Rack: /rack3 127.0.0.1:51508 (localhost) In Service Rack: /rack4 127.0.0.1:51512 (localhost) In Service 127.0.0.1:51516 (localhost) In Service ] ``` ### How was this patch tested? UT ### For code changes: - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? Issue Time Tracking --- Worklog Id: (was: 782742) Remaining Estimate: 0h Time Spent: 10m > TestHDFSCLI#testAll consistently failing > > > Key: HDFS-16637 > URL: https://issues.apache.org/jira/browse/HDFS-16637 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > The failure seems to have been caused by output change introduced by > HDFS-16581. > {code:java} > 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(146)) - Detailed results: > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(147)) - > --2022-06-19 15:41:16,184 [Listener at > localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(156)) - > --- > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(157)) - Test ID: [629] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(158)) - Test Description: > [printTopology: verifying that the topology map is what we expect] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO
[jira] [Updated] (HDFS-16637) TestHDFSCLI#testAll consistently failing
[ https://issues.apache.org/jira/browse/HDFS-16637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16637: -- Labels: pull-request-available (was: ) > TestHDFSCLI#testAll consistently failing > > > Key: HDFS-16637 > URL: https://issues.apache.org/jira/browse/HDFS-16637 > Project: Hadoop HDFS > Issue Type: Test >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The failure seems to have been caused by output change introduced by > HDFS-16581. > {code:java} > 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(146)) - Detailed results: > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(147)) - > --2022-06-19 15:41:16,184 [Listener at > localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(156)) - > --- > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(157)) - Test ID: [629] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(158)) - Test Description: > [printTopology: verifying that the topology map is what we expect] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(159)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(163)) - Test Commands: [-fs > hdfs://localhost:51486 -printTopology] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(167)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(174)) - > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(178)) - Comparator: > [RegexpAcrossOutputComparator] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(180)) - Comparision result: > [fail] > 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(182)) - Expected output: > [^Rack: > \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)] > 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO cli.CLITestHelper > (CLITestHelper.java:displayResults(184)) - Actual output: > [Rack: /rack1 > 127.0.0.1:51487 (localhost) In Service > 127.0.0.1:51491 (localhost) In ServiceRack: /rack2 > 127.0.0.1:51500 (localhost) In Service > 127.0.0.1:51496 (localhost) In Service > 127.0.0.1:51504 (localhost) In ServiceRack: /rack3 > 127.0.0.1:51508 (localhost) In ServiceRack: /rack4 > 127.0.0.1:51512 (localhost) In Service > 127.0.0.1:51516 (localhost) In Service] > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16637) TestHDFSCLI#testAll consistently failing
Viraj Jasani created HDFS-16637: --- Summary: TestHDFSCLI#testAll consistently failing Key: HDFS-16637 URL: https://issues.apache.org/jira/browse/HDFS-16637 Project: Hadoop HDFS Issue Type: Test Reporter: Viraj Jasani Assignee: Viraj Jasani The failure seems to have been caused by output change introduced by HDFS-16581. {code:java} 2022-06-19 15:41:16,183 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(146)) - Detailed results: 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(147)) - --2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(156)) - --- 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(157)) - Test ID: [629] 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(158)) - Test Description: [printTopology: verifying that the topology map is what we expect] 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(159)) - 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(163)) - Test Commands: [-fs hdfs://localhost:51486 -printTopology] 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(167)) - 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(174)) - 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(178)) - Comparator: [RegexpAcrossOutputComparator] 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(180)) - Comparision result: [fail] 2022-06-19 15:41:16,184 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(182)) - Expected output: [^Rack: \/rack1\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)\s*127\.0\.0\.1:\d+\s\([-.a-zA-Z0-9]+\)] 2022-06-19 15:41:16,185 [Listener at localhost/51519] INFO cli.CLITestHelper (CLITestHelper.java:displayResults(184)) - Actual output: [Rack: /rack1 127.0.0.1:51487 (localhost) In Service 127.0.0.1:51491 (localhost) In ServiceRack: /rack2 127.0.0.1:51500 (localhost) In Service 127.0.0.1:51496 (localhost) In Service 127.0.0.1:51504 (localhost) In ServiceRack: /rack3 127.0.0.1:51508 (localhost) In ServiceRack: /rack4 127.0.0.1:51512 (localhost) In Service 127.0.0.1:51516 (localhost) In Service] {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=782721=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782721 ] ASF GitHub Bot logged work on HDFS-16064: - Author: ASF GitHub Bot Created on: 19/Jun/22 16:07 Start Date: 19/Jun/22 16:07 Worklog Time Spent: 10m Work Description: ashutoshcipher commented on PR #4410: URL: https://github.com/apache/hadoop/pull/4410#issuecomment-1159766297 > Filed [HDFS-16635](https://issues.apache.org/jira/browse/HDFS-16635) to fix javadoc error. @aajisaka Raised PR for the same. Issue Time Tracking --- Worklog Id: (was: 782721) Time Spent: 1h 40m (was: 1.5h) > HDFS-721 causes DataNode decommissioning to get stuck indefinitely > -- > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the
[jira] (HDFS-2546) The C HDFS API should work with secure HDFS
[ https://issues.apache.org/jira/browse/HDFS-2546 ] Ambar Hegde deleted comment on HDFS-2546: --- was (Author: ambar): I am working on this. > The C HDFS API should work with secure HDFS > --- > > Key: HDFS-2546 > URL: https://issues.apache.org/jira/browse/HDFS-2546 > Project: Hadoop HDFS > Issue Type: New Feature > Components: libhdfs >Affects Versions: 2.0.0-alpha >Reporter: Harsh J >Priority: Major > > Right now, the libhdfs will not work with Kerberos Hadoop. In case libhdfs is > still being supported, it must fully work with Kerberized instances of HDFS. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org