[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782238 ] ASF GitHub Bot logged work on HDFS-13522: - Author: ASF GitHub Bot Created on: 17/Jun/22 03:26 Start Date: 17/Jun/22 03:26 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4127: URL: https://github.com/apache/hadoop/pull/4127#issuecomment-1158450996 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 58s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 12 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 40m 48s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 24m 57s | | trunk passed | | +1 :green_heart: | compile | 23m 4s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 20m 33s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 4m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 7m 50s | | trunk passed | | -1 :x: | javadoc | 1m 45s | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/15/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in trunk failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 6m 45s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 12m 26s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 33s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 30s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 4m 10s | | the patch passed | | +1 :green_heart: | compile | 22m 13s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 22m 13s | | the patch passed | | +1 :green_heart: | compile | 20m 29s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 20m 29s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/15/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 4m 20s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/15/artifact/out/results-checkstyle-root.txt) | root: The patch generated 3 new + 339 unchanged - 1 fixed = 342 total (was 340) | | +1 :green_heart: | mvnsite | 7m 40s | | the patch passed | | -1 :x: | javadoc | 1m 45s | [/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/15/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 6m 44s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 12m 56s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 49s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 33s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 3m 16s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 413m 33s | | hadoop-hdfs in the patch passed. | | -1 :x: | unit | 2m 11s |
[jira] [Work logged] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?focusedWorklogId=782228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782228 ] ASF GitHub Bot logged work on HDFS-16613: - Author: ASF GitHub Bot Created on: 17/Jun/22 02:46 Start Date: 17/Jun/22 02:46 Worklog Time Spent: 10m Work Description: lfxy closed pull request #4391: HDFS-16613. EC: Improve performance of decommissioning dn with many ec blocks URL: https://github.com/apache/hadoop/pull/4391 Issue Time Tracking --- Worklog Id: (was: 782228) Time Spent: 2.5h (was: 2h 20m) > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?focusedWorklogId=782227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782227 ] ASF GitHub Bot logged work on HDFS-16613: - Author: ASF GitHub Bot Created on: 17/Jun/22 02:46 Start Date: 17/Jun/22 02:46 Worklog Time Spent: 10m Work Description: lfxy commented on PR #4398: URL: https://github.com/apache/hadoop/pull/4398#issuecomment-1158429085 @hi-adachi OK, I see, thank you very much! Issue Time Tracking --- Worklog Id: (was: 782227) Time Spent: 2h 20m (was: 2h 10m) > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 2h 20m > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?focusedWorklogId=782197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782197 ] ASF GitHub Bot logged work on HDFS-16613: - Author: ASF GitHub Bot Created on: 17/Jun/22 01:02 Start Date: 17/Jun/22 01:02 Worklog Time Spent: 10m Work Description: hi-adachi commented on PR #4398: URL: https://github.com/apache/hadoop/pull/4398#issuecomment-1158359483 @lfxy The PR was merged, this is just FYI, the contribution guide says as follows. Thank you for your contribution. > https://cwiki.apache.org/confluence/display/hadoop/how+to+contribute > Once a "+1" comment is received from the automated patch testing system and a code reviewer has set the Reviewed flag on the issue's Jira, a committer should then evaluate it within a few days and either: commit it; or reject it with an explanation. > > Please be patient. Committers are busy people too. If no one responds to your patch after a few days, please make friendly reminders. Please incorporate other's suggestions into your patch if you think they're reasonable. Finally, remember that even a patch that is not committed is useful to the community. Issue Time Tracking --- Worklog Id: (was: 782197) Time Spent: 2h 10m (was: 2h) > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782203 ] ASF GitHub Bot logged work on HDFS-16634: - Author: ASF GitHub Bot Created on: 17/Jun/22 01:13 Start Date: 17/Jun/22 01:13 Worklog Time Spent: 10m Work Description: virajjasani opened a new pull request, #4448: URL: https://github.com/apache/hadoop/pull/4448 ### Description of PR On a busy cluster, sometimes it takes bit of time for deleted node(from the cluster)'s "slow node report" to get removed from slow peer json report on Namenode JMX metrics. In the meantime, user should be able to browse through more entries in the report by adjusting i.e. reconfiguring "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted without user having to bounce active Namenode just for this purpose. ### How was this patch tested? Dev cluster and using UT. ### For code changes: - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? Issue Time Tracking --- Worklog Id: (was: 782203) Remaining Estimate: 0h Time Spent: 10m > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
[ https://issues.apache.org/jira/browse/HDFS-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16634: -- Labels: pull-request-available (was: ) > Dynamically adjust slow peer report size on JMX metrics > --- > > Key: HDFS-16634 > URL: https://issues.apache.org/jira/browse/HDFS-16634 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > On a busy cluster, sometimes it takes bit of time for deleted node(from the > cluster)'s "slow node report" to get removed from slow peer json report on > Namenode JMX metrics. In the meantime, user should be able to browse through > more entries in the report by adjusting i.e. reconfiguring > "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted > without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics
Viraj Jasani created HDFS-16634: --- Summary: Dynamically adjust slow peer report size on JMX metrics Key: HDFS-16634 URL: https://issues.apache.org/jira/browse/HDFS-16634 Project: Hadoop HDFS Issue Type: Task Reporter: Viraj Jasani Assignee: Viraj Jasani On a busy cluster, sometimes it takes bit of time for deleted node(from the cluster)'s "slow node report" to get removed from slow peer json report on Namenode JMX metrics. In the meantime, user should be able to browse through more entries in the report by adjusting i.e. reconfiguring "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted without user having to bounce active Namenode just for this purpose. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782198=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782198 ] ASF GitHub Bot logged work on HDFS-13522: - Author: ASF GitHub Bot Created on: 17/Jun/22 01:06 Start Date: 17/Jun/22 01:06 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4311: URL: https://github.com/apache/hadoop/pull/4311#issuecomment-1158361679 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 37s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 44m 39s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 24m 44s | | trunk passed | | +1 :green_heart: | compile | 22m 49s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 20m 27s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 4m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 7m 44s | | trunk passed | | -1 :x: | javadoc | 1m 45s | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/8/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in trunk failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 6m 50s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 12m 22s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 27s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 34s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 4m 10s | | the patch passed | | +1 :green_heart: | compile | 22m 3s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 22m 3s | | the patch passed | | +1 :green_heart: | compile | 20m 26s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 20m 26s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/8/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 4m 13s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/8/artifact/out/results-checkstyle-root.txt) | root: The patch generated 3 new + 198 unchanged - 1 fixed = 201 total (was 199) | | +1 :green_heart: | mvnsite | 7m 42s | | the patch passed | | -1 :x: | javadoc | 1m 45s | [/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/8/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 6m 49s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 12m 54s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 52s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 42s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 3m 14s | | hadoop-hdfs-client in the patch passed. | | -1 :x: | unit | 256m 19s |
[jira] [Work logged] (HDFS-16566) Erasure Coding: Recovery may causes excess replicas when busy DN exsits
[ https://issues.apache.org/jira/browse/HDFS-16566?focusedWorklogId=782183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782183 ] ASF GitHub Bot logged work on HDFS-16566: - Author: ASF GitHub Bot Created on: 16/Jun/22 22:58 Start Date: 16/Jun/22 22:58 Worklog Time Spent: 10m Work Description: jojochuang commented on code in PR #4252: URL: https://github.com/apache/hadoop/pull/4252#discussion_r899627843 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java: ## @@ -1040,11 +1040,16 @@ public static BlockECReconstructionInfo convertBlockECReconstructionInfo( byte[] liveBlkIndices = blockEcReconstructionInfoProto.getLiveBlockIndices() .toByteArray(); +byte[] excludeReconstructedIndices = Review Comment: Please check and make sure ExcludeReconstructedIndices is filled. ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/erasurecoding.proto: ## @@ -108,6 +108,7 @@ message BlockECReconstructionInfoProto { required StorageTypesProto targetStorageTypes = 5; required bytes liveBlockIndices = 6; required ErasureCodingPolicyProto ecPolicy = 7; + required bytes excludeReconstructedIndices = 8; Review Comment: ```suggestion optional bytes excludeReconstructedIndices = 8; ``` Make it optional to ensure backward compatibility. Issue Time Tracking --- Worklog Id: (was: 782183) Time Spent: 3.5h (was: 3h 20m) > Erasure Coding: Recovery may causes excess replicas when busy DN exsits > --- > > Key: HDFS-16566 > URL: https://issues.apache.org/jira/browse/HDFS-16566 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.2 >Reporter: Ruinan Gu >Priority: Major > Labels: pull-request-available > Time Spent: 3.5h > Remaining Estimate: 0h > > Simple case: > RS3-2 ,[0(busy),2,3,4] (1 missing),0 is busy. > We can get liveblockIndice=[2,3,4], additionalRepl=1.So the DN will get the > LiveBitSet=[2,3,4] and targets.length=1. > According to StripedWriter.initTargetIndices(), 0 will get recovered instead > of 1. So the internal blocks will become [0(busy),2,3,4,0'(excess)].Although > NN will detect, delete the excess replicas and recover the missing block(1) > correctly after the wrong recovery of 0', I don't think this process is > expected and the recovery of 0' is obviously wrong and not necessary. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16591) StateStoreZooKeeper fails to initialize
[ https://issues.apache.org/jira/browse/HDFS-16591?focusedWorklogId=782176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782176 ] ASF GitHub Bot logged work on HDFS-16591: - Author: ASF GitHub Bot Created on: 16/Jun/22 21:55 Start Date: 16/Jun/22 21:55 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4447: URL: https://github.com/apache/hadoop/pull/4447#issuecomment-1158170997 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 42s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 41s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 26m 49s | | trunk passed | | +1 :green_heart: | compile | 22m 56s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 20m 37s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 48s | | trunk passed | | +1 :green_heart: | mvnsite | 4m 50s | | trunk passed | | +1 :green_heart: | javadoc | 4m 21s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 48s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 6m 9s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 49s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 32s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 7s | | the patch passed | | +1 :green_heart: | compile | 21m 58s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 21m 58s | | the patch passed | | +1 :green_heart: | compile | 20m 33s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 20m 33s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 49s | [/results-checkstyle-hadoop-common-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4447/1/artifact/out/results-checkstyle-hadoop-common-project.txt) | hadoop-common-project: The patch generated 5 new + 203 unchanged - 2 fixed = 208 total (was 205) | | +1 :green_heart: | mvnsite | 4m 31s | | the patch passed | | +1 :green_heart: | javadoc | 4m 8s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 3m 43s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | -1 :x: | spotbugs | 1m 39s | [/new-spotbugs-hadoop-common-project_hadoop-auth.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4447/1/artifact/out/new-spotbugs-hadoop-common-project_hadoop-auth.html) | hadoop-common-project/hadoop-auth generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | shadedclient | 21m 56s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 4m 0s | | hadoop-auth in the patch passed. | | +1 :green_heart: | unit | 18m 28s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 1m 57s | | hadoop-registry in the patch passed. | | +1 :green_heart: | asflicense | 1m 34s | | The patch does not generate ASF License warnings. | | | | 247m 46s | | | | Reason | Tests | |---:|:--| | SpotBugs | module:hadoop-common-project/hadoop-auth | | | Write to static field org.apache.hadoop.security.authentication.util.JaasConfiguration.entry from instance method new org.apache.hadoop.security.authentication.util.JaasConfiguration(String, String, String) At JaasConfiguration.java:from instance method new org.apache.hadoop.security.authentication.util.JaasConfiguration(String, String, String) At
[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=782169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782169 ] ASF GitHub Bot logged work on HDFS-16064: - Author: ASF GitHub Bot Created on: 16/Jun/22 21:37 Start Date: 16/Jun/22 21:37 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4410: URL: https://github.com/apache/hadoop/pull/4410#issuecomment-1158158938 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 56s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 39m 25s | | trunk passed | | +1 :green_heart: | compile | 1m 39s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 21s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 40s | | trunk passed | | -1 :x: | javadoc | 1m 20s | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in trunk failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 43s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 58s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 24s | | the patch passed | | +1 :green_heart: | compile | 1m 30s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 30s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 2s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 100 unchanged - 0 fixed = 101 total (was 100) | | +1 :green_heart: | mvnsite | 1m 28s | | the patch passed | | -1 :x: | javadoc | 1m 0s | [/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 35s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 0s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 381m 44s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 1s | | The patch does not generate ASF License warnings. | | | | 498m 26s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4410 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux efcbee072994 4.15.0-166-generic
[jira] [Work logged] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?focusedWorklogId=782142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782142 ] ASF GitHub Bot logged work on HDFS-16613: - Author: ASF GitHub Bot Created on: 16/Jun/22 18:11 Start Date: 16/Jun/22 18:11 Worklog Time Spent: 10m Work Description: jojochuang merged PR #4398: URL: https://github.com/apache/hadoop/pull/4398 Issue Time Tracking --- Worklog Id: (was: 782142) Time Spent: 2h (was: 1h 50m) > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 2h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang updated HDFS-16613: --- Fix Version/s: 3.4.0 Resolution: Fixed Status: Resolved (was: Patch Available) > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 2h > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16591) StateStoreZooKeeper fails to initialize
[ https://issues.apache.org/jira/browse/HDFS-16591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16591: -- Labels: pull-request-available (was: ) > StateStoreZooKeeper fails to initialize > --- > > Key: HDFS-16591 > URL: https://issues.apache.org/jira/browse/HDFS-16591 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: Hector Sandoval Chaverri >Priority: Major > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > MembershipStore and MountTableStore are failing to initialize, logging the > following errors on the Router logs: > {noformat} > 2022-05-23 16:43:01,156 ERROR > org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService: > Cannot get version for class > org.apache.hadoop.hdfs.server.federation.store.MembershipStore > org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException: > Cached State Store not initialized, MembershipState records not valid > at > org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.checkCacheAvailable(CachedRecordStore.java:106) > at > org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.getCachedRecords(CachedRecordStore.java:227) > at > org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.getStateStoreVersion(RouterHeartbeatService.java:131) > at > org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.updateStateStore(RouterHeartbeatService.java:92) > at > org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.periodicInvoke(RouterHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){noformat} > After investigating, we noticed that ZKDelegationTokenSecretManager normally > initializes properties for ZooKeeper clients to connect using SASL/Kerberos. > If ZKDelegationTokenSecretManager is replaced with a new SecretManager, the > SASL properties don't get configured and any StateStores that connect to > ZooKeeper fail with the above error. > A potential way to fix this is by setting the JaasConfiguration (currently > done in ZKDelegationTokenSecretManager) as part of the > StateStoreZooKeeperImpl initialization method. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16591) StateStoreZooKeeper fails to initialize
[ https://issues.apache.org/jira/browse/HDFS-16591?focusedWorklogId=782128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782128 ] ASF GitHub Bot logged work on HDFS-16591: - Author: ASF GitHub Bot Created on: 16/Jun/22 17:46 Start Date: 16/Jun/22 17:46 Worklog Time Spent: 10m Work Description: hchaverri opened a new pull request, #4447: URL: https://github.com/apache/hadoop/pull/4447 …enabled ### Description of PR Setting up the JaasConfiguration when creating a new ZKCuratorManager, to allow ZK connections via SASL. Also removing duplicated classes of JaasConfiguration. ### How was this patch tested? Ran the following unit tests: TestJaasConfiguration TestZKCuratorManager TestZKSignerSecretProvider TestZKDelegationTokenSecretManager TestMicroZookeeperService Created a TestDelegationTokenSecretManager to replace the default ZKDelegationTokenSecretManagerImpl and deployed to an RBF router. Without these changes, the router initialization will fail with the error described on HDFS-16591. Initialization succeeds with this patch. ### For code changes: - [ ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [ ] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? Issue Time Tracking --- Worklog Id: (was: 782128) Remaining Estimate: 0h Time Spent: 10m > StateStoreZooKeeper fails to initialize > --- > > Key: HDFS-16591 > URL: https://issues.apache.org/jira/browse/HDFS-16591 > Project: Hadoop HDFS > Issue Type: Bug > Components: rbf >Reporter: Hector Sandoval Chaverri >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > MembershipStore and MountTableStore are failing to initialize, logging the > following errors on the Router logs: > {noformat} > 2022-05-23 16:43:01,156 ERROR > org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService: > Cannot get version for class > org.apache.hadoop.hdfs.server.federation.store.MembershipStore > org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException: > Cached State Store not initialized, MembershipState records not valid > at > org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.checkCacheAvailable(CachedRecordStore.java:106) > at > org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.getCachedRecords(CachedRecordStore.java:227) > at > org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.getStateStoreVersion(RouterHeartbeatService.java:131) > at > org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.updateStateStore(RouterHeartbeatService.java:92) > at > org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.periodicInvoke(RouterHeartbeatService.java:159) > at > org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748){noformat} > After investigating, we noticed that ZKDelegationTokenSecretManager normally > initializes properties for ZooKeeper clients to connect using SASL/Kerberos. > If ZKDelegationTokenSecretManager is replaced with a new SecretManager, the > SASL properties don't get configured and any StateStores that connect to > ZooKeeper fail with the above error. > A potential way to fix this is by setting the JaasConfiguration (currently > done in ZKDelegationTokenSecretManager) as part of the > StateStoreZooKeeperImpl initialization method. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To
[jira] [Work logged] (HDFS-16605) Improve Code With Lambda in hadoop-hdfs-rbf moudle
[ https://issues.apache.org/jira/browse/HDFS-16605?focusedWorklogId=782101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782101 ] ASF GitHub Bot logged work on HDFS-16605: - Author: ASF GitHub Bot Created on: 16/Jun/22 16:15 Start Date: 16/Jun/22 16:15 Worklog Time Spent: 10m Work Description: slfan1989 commented on PR #4375: URL: https://github.com/apache/hadoop/pull/4375#issuecomment-1157862822 @goiri Can you help me merge this pr to trunk branch? Thanks for helping me review the code! Issue Time Tracking --- Worklog Id: (was: 782101) Time Spent: 2h 20m (was: 2h 10m) > Improve Code With Lambda in hadoop-hdfs-rbf moudle > -- > > Key: HDFS-16605 > URL: https://issues.apache.org/jira/browse/HDFS-16605 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.4.0 >Reporter: fanshilun >Assignee: fanshilun >Priority: Minor > Labels: pull-request-available > Time Spent: 2h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782080 ] ASF GitHub Bot logged work on HDFS-13522: - Author: ASF GitHub Bot Created on: 16/Jun/22 15:25 Start Date: 16/Jun/22 15:25 Worklog Time Spent: 10m Work Description: simbadzina commented on PR #4441: URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157792454 @ZanderXu is https://github.com/apache/hadoop/pull/4127 I have configurations on both the router and client side. Consistency is also guaranteed because the router always does an msync. The reason for the client side configuration is for latency sensitive clients that just want one call between the router and the namenodes. Issue Time Tracking --- Worklog Id: (was: 782080) Time Spent: 14h (was: 13h 50m) > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 14h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782073 ] ASF GitHub Bot logged work on HDFS-13522: - Author: ASF GitHub Bot Created on: 16/Jun/22 15:18 Start Date: 16/Jun/22 15:18 Worklog Time Spent: 10m Work Description: simbadzina commented on PR #4441: URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157784867 > Thanks @zhengchenyu and @simbadzina . > > > I think config in client side may be more flexible. > > This is a very meaningful topic. If only the client controls whether or not to enable ObserverRead will be more difficult for Admin to control, because it is very difficult to upgrade the HDFS client in full. In other words: If RBF controls whether the ObserverRead is enabled, the Admin will be very convenient to control the ObserverRead of the entire cluster, and even dynamically control whether the ObserverRead of a single NS or the entire cluster is enabled. But there may be some special Client that do not want to enable ObserverRead, so RBF should identify those requests and proxy them to the Active Namenode. > > @simbadzina This is why dynamic updates are required, so that when Admin finds that there are some abnormal Observer NameNodes, he/she can quickly disable the ObserverRead of one NS or even all NSs. > > > In our draft design, after apply [HDFS-13522](https://issues.apache.org/jira/browse/HDFS-13522).002.patch, I wanna proxy client's state id. > > Proxying client's state id to the NameNode by RBF will be very complicated. > > * A DFSClient may read or write some paths of different NameServices, and the stateID of different NS may be different. > * The client does not know the Nameservice to which the reading or writing path belong, so it cannot pass the state id to RBF. @ZanderXu in my full PR, https://github.com/apache/hadoop/pull/4127, I do also allow routers to enable and disable observer reads. The difference being that it requires a router restart. Since routers are stateless this is a quick operation. At most one minute. Issue Time Tracking --- Worklog Id: (was: 782073) Time Spent: 13h 50m (was: 13h 40m) > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 13h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782059 ] ASF GitHub Bot logged work on HDFS-13522: - Author: ASF GitHub Bot Created on: 16/Jun/22 14:01 Start Date: 16/Jun/22 14:01 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4441: URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157696846 > We know observer can not guarantee strong consistency, maybe some use have high demand, they wanna disable observe read, though few user have this demand. Only a very small number of users have high demand, and in most cases, the client enables ObserverRead default. In other words: In most cases, there is no need for client to pass the ObserverRead enable flag to RBF. So only a very small number of requests need to carry a specific flag bit to RBF, so that the RBF can force an msync to ensure the consistency before proxying the request. There are serval methods for the client side to carry the force consistency flag to RBF: 1. Carry a special StateID to RBF, such as -100 (Client Process level) 2. Carry a special filed attributes to RBF through CallerContext (single RPC level) 3. etc.. Issue Time Tracking --- Worklog Id: (was: 782059) Time Spent: 13h 40m (was: 13.5h) > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 13h 40m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782049 ] ASF GitHub Bot logged work on HDFS-13522: - Author: ASF GitHub Bot Created on: 16/Jun/22 13:37 Start Date: 16/Jun/22 13:37 Worklog Time Spent: 10m Work Description: zhengchenyu commented on PR #4441: URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157672472 > Thanks @zhengchenyu and @simbadzina . > > > I think config in client side may be more flexible. > > This is a very meaningful topic. If only the client controls whether or not to enable ObserverRead will be more difficult for Admin to control, because it is very difficult to upgrade the HDFS client in full. In other words: If RBF controls whether the ObserverRead is enabled, the Admin will be very convenient to control the ObserverRead of the entire cluster, and even dynamically control whether the ObserverRead of a single NS or the entire cluster is enabled. But there may be some special Client that do not want to enable ObserverRead, so RBF should identify those requests and proxy them to the Active Namenode. > > @simbadzina This is why dynamic updates are required, so that when Admin finds that there are some abnormal Observer NameNodes, he/she can quickly disable the ObserverRead of one NS or even all NSs. > > > In our draft design, after apply [HDFS-13522](https://issues.apache.org/jira/browse/HDFS-13522).002.patch, I wanna proxy client's state id. > > Proxying client's state id to the NameNode by RBF will be very complicated. > > * A DFSClient may read or write some paths of different NameServices, and the stateID of different NS may be different. > * The client does not know the Nameservice to which the reading or writing path belong, so it cannot pass the state id to RBF. Yes, you are right in some condition. If all client are common user, for hive and mr application, it is right. We know observer can not guarantee strong consistency, maybe some use have high demand, they could wanna disable observe read, though few user have this demand. Maybe we can reserve configuration both on router side and client side. Yes, Proxying client's state id is complicated. I don't know whether it is necessary or not. So just delay it. Issue Time Tracking --- Worklog Id: (was: 782049) Time Spent: 13.5h (was: 13h 20m) > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 13.5h > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=782040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782040 ] ASF GitHub Bot logged work on HDFS-16064: - Author: ASF GitHub Bot Created on: 16/Jun/22 13:20 Start Date: 16/Jun/22 13:20 Worklog Time Spent: 10m Work Description: KevinWikant commented on PR #4410: URL: https://github.com/apache/hadoop/pull/4410#issuecomment-1157654335 @ashutoshcipher @aajisaka @ZanderXu really appreciate the reviews on this PR, thank you! @aajisaka I have removed the unused imports, please let me know if you have any other comments/concerns Issue Time Tracking --- Worklog Id: (was: 782040) Time Spent: 1h 10m (was: 1h) > HDFS-721 causes DataNode decommissioning to get stuck indefinitely > -- > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum
[jira] [Updated] (HDFS-16633) Reserved Space For Replicas is not released on some cases
[ https://issues.apache.org/jira/browse/HDFS-16633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated HDFS-16633: - Description: Have found the Reserved Space For Replicas is not released on some cases in a Cx Prod cluster. There are few fixes like HDFS-9530 and HDFS-8072 but still the issue is not completely fixed. Have tried to debug the root cause but this will take lot of time as it is Cx Prod Cluster. But we have an easier way to fix the issue completely by releasing any remaining reserved space from BlockReceiver#close which is initiated by DataXceiver#writeBlock finally. was: Have found the Reserved Space For Replicas is not released on a Cx Prod cluster. There are few fixes like HDFS-9530 and HDFS-8072 but still the issue is not completely fixed. Have tried to debug the root cause but this will take lot of time as it is Cx Prod Cluster. But we have an easier way to fix the issue completely by releasing any remaining reserved space from BlockReceiver#close which is initiated by DataXceiver#writeBlock finally. > Reserved Space For Replicas is not released on some cases > - > > Key: HDFS-16633 > URL: https://issues.apache.org/jira/browse/HDFS-16633 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > Have found the Reserved Space For Replicas is not released on some cases in a > Cx Prod cluster. There are few fixes like HDFS-9530 and HDFS-8072 but still > the issue is not completely fixed. Have tried to debug the root cause but > this will take lot of time as it is Cx Prod Cluster. > But we have an easier way to fix the issue completely by releasing any > remaining reserved space from BlockReceiver#close which is initiated by > DataXceiver#writeBlock finally. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16633) Reserved Space For Replicas is not released on some cases
[ https://issues.apache.org/jira/browse/HDFS-16633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prabhu Joseph updated HDFS-16633: - Summary: Reserved Space For Replicas is not released on some cases (was: Reserved Space For Replicas is not released ) > Reserved Space For Replicas is not released on some cases > - > > Key: HDFS-16633 > URL: https://issues.apache.org/jira/browse/HDFS-16633 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.1.2 >Reporter: Prabhu Joseph >Assignee: Prabhu Joseph >Priority: Major > > Have found the Reserved Space For Replicas is not released on a Cx Prod > cluster. There are few fixes like HDFS-9530 and HDFS-8072 but still the issue > is not completely fixed. Have tried to debug the root cause but this will > take lot of time as it is Cx Prod Cluster. > But we have an easier way to fix the issue completely by releasing any > remaining reserved space from BlockReceiver#close which is initiated by > DataXceiver#writeBlock finally. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16633) Reserved Space For Replicas is not released
Prabhu Joseph created HDFS-16633: Summary: Reserved Space For Replicas is not released Key: HDFS-16633 URL: https://issues.apache.org/jira/browse/HDFS-16633 Project: Hadoop HDFS Issue Type: Bug Components: hdfs Affects Versions: 3.1.2 Reporter: Prabhu Joseph Assignee: Prabhu Joseph Have found the Reserved Space For Replicas is not released on a Cx Prod cluster. There are few fixes like HDFS-9530 and HDFS-8072 but still the issue is not completely fixed. Have tried to debug the root cause but this will take lot of time as it is Cx Prod Cluster. But we have an easier way to fix the issue completely by releasing any remaining reserved space from BlockReceiver#close which is initiated by DataXceiver#writeBlock finally. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16581) Print node status when executing printTopology
[ https://issues.apache.org/jira/browse/HDFS-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tao Li resolved HDFS-16581. --- Fix Version/s: 3.4.0 3.3.4 Resolution: Resolved > Print node status when executing printTopology > -- > > Key: HDFS-16581 > URL: https://issues.apache.org/jira/browse/HDFS-16581 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsadmin, namenode >Affects Versions: 3.3.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0, 3.3.4 > > Time Spent: 3h > Remaining Estimate: 0h > > We can use the dfsadmin tool to see which DataNodes the cluster has, and some > of these nodes are alive, DECOMMISSIONED, or DECOMMISSION_INPROGRESS. It > would be helpful if we could get this information in a timely manner, such as > troubleshooting cluster failures, tracking node status, etc. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16581) Print node status when executing printTopology
[ https://issues.apache.org/jira/browse/HDFS-16581?focusedWorklogId=782024=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782024 ] ASF GitHub Bot logged work on HDFS-16581: - Author: ASF GitHub Bot Created on: 16/Jun/22 11:43 Start Date: 16/Jun/22 11:43 Worklog Time Spent: 10m Work Description: tomscut commented on PR #4321: URL: https://github.com/apache/hadoop/pull/4321#issuecomment-1157563468 Thanks @jianghuazhu for your contribution. Thanks @virajjasani for your review. Issue Time Tracking --- Worklog Id: (was: 782024) Time Spent: 3h (was: 2h 50m) > Print node status when executing printTopology > -- > > Key: HDFS-16581 > URL: https://issues.apache.org/jira/browse/HDFS-16581 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsadmin, namenode >Affects Versions: 3.3.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 3h > Remaining Estimate: 0h > > We can use the dfsadmin tool to see which DataNodes the cluster has, and some > of these nodes are alive, DECOMMISSIONED, or DECOMMISSION_INPROGRESS. It > would be helpful if we could get this information in a timely manner, such as > troubleshooting cluster failures, tracking node status, etc. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16581) Print node status when executing printTopology
[ https://issues.apache.org/jira/browse/HDFS-16581?focusedWorklogId=782018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782018 ] ASF GitHub Bot logged work on HDFS-16581: - Author: ASF GitHub Bot Created on: 16/Jun/22 11:19 Start Date: 16/Jun/22 11:19 Worklog Time Spent: 10m Work Description: tomscut merged PR #4321: URL: https://github.com/apache/hadoop/pull/4321 Issue Time Tracking --- Worklog Id: (was: 782018) Time Spent: 2h 50m (was: 2h 40m) > Print node status when executing printTopology > -- > > Key: HDFS-16581 > URL: https://issues.apache.org/jira/browse/HDFS-16581 > Project: Hadoop HDFS > Issue Type: Improvement > Components: dfsadmin, namenode >Affects Versions: 3.3.0 >Reporter: JiangHua Zhu >Assignee: JiangHua Zhu >Priority: Major > Labels: pull-request-available > Time Spent: 2h 50m > Remaining Estimate: 0h > > We can use the dfsadmin tool to see which DataNodes the cluster has, and some > of these nodes are alive, DECOMMISSIONED, or DECOMMISSION_INPROGRESS. It > would be helpful if we could get this information in a timely manner, such as > troubleshooting cluster failures, tracking node status, etc. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782013 ] ASF GitHub Bot logged work on HDFS-13522: - Author: ASF GitHub Bot Created on: 16/Jun/22 11:04 Start Date: 16/Jun/22 11:04 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4441: URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157530100 As in my draft PR above, RBF always updates lastSeenTxid from Active and saves. When an NS enable ObserverRead, RBF will set the stored lastSeenTxid of this NS to the RPC header and bring it to the Observer NameNode; if the NS disable ObserverRead, RBF will not set the stated id in RPC header, so even if the request is passed to the Observer, the Observer will also returns StandbyException. Issue Time Tracking --- Worklog Id: (was: 782013) Time Spent: 13h 20m (was: 13h 10m) > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 13h 20m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782009 ] ASF GitHub Bot logged work on HDFS-13522: - Author: ASF GitHub Bot Created on: 16/Jun/22 10:58 Start Date: 16/Jun/22 10:58 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4441: URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157525096 Thanks @zhengchenyu and @simbadzina . > I think config in client side may be more flexible. This is a very meaningful topic. If only the client controls whether or not to enable ObserverRead will be more difficult for Admin to control, because it is very difficult to upgrade the HDFS client in full. In other words: If RBF controls whether the ObserverRead is enabled, the Admin will be very convenient to control the ObserverRead of the entire cluster, and even dynamically control whether the ObserverRead of a single NS or the entire cluster is enabled. But there may be some special Client that do not want to enable ObserverRead, so RBF should identify those requests and proxy them to the Active Namenode. @simbadzina This is why dynamic updates are required, so that when Admin finds that there are some abnormal Observer NameNodes, he/she can quickly disable the ObserverRead of one NS or even all NSs. > In our draft design, after apply [HDFS-13522](https://issues.apache.org/jira/browse/HDFS-13522).002.patch, I wanna proxy client's state id. Proxying client's state id to the NameNode by RBF will be very complicated. - A DFSClient may read or write some paths of different NameServices, and the stateID of different NS may be different. - The client does not know the Nameservice to which the reading or writing path belong, so it cannot pass the state id to RBF. Issue Time Tracking --- Worklog Id: (was: 782009) Time Spent: 13h 10m (was: 13h) > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png > > Time Spent: 13h 10m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=781996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781996 ] ASF GitHub Bot logged work on HDFS-13522: - Author: ASF GitHub Bot Created on: 16/Jun/22 10:22 Start Date: 16/Jun/22 10:22 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4127: URL: https://github.com/apache/hadoop/pull/4127#issuecomment-1157494963 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 49s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 1s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 12 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 19s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 28m 12s | | trunk passed | | +1 :green_heart: | compile | 24m 51s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 26m 45s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 5m 30s | | trunk passed | | +1 :green_heart: | mvnsite | 6m 47s | | trunk passed | | -1 :x: | javadoc | 1m 31s | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/14/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in trunk failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 5m 42s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 11m 51s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 6s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 26s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 4m 5s | | the patch passed | | +1 :green_heart: | compile | 24m 3s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 24m 3s | | the patch passed | | +1 :green_heart: | compile | 21m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 21m 30s | | the patch passed | | -1 :x: | blanks | 0m 0s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/14/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 4m 26s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/14/artifact/out/results-checkstyle-root.txt) | root: The patch generated 3 new + 339 unchanged - 1 fixed = 342 total (was 340) | | +1 :green_heart: | mvnsite | 6m 36s | | the patch passed | | -1 :x: | javadoc | 1m 30s | [/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/14/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 5m 44s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 12m 26s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 51s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 10s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 2m 54s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 362m 44s | | hadoop-hdfs in the patch passed. | | -1 :x: | unit | 34m 3s |
[jira] [Work logged] (HDFS-16600) Deadlock on DataNode
[ https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=781994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781994 ] ASF GitHub Bot logged work on HDFS-16600: - Author: ASF GitHub Bot Created on: 16/Jun/22 10:11 Start Date: 16/Jun/22 10:11 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4367: URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1157483601 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 39m 57s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 40m 26s | | trunk passed | | +1 :green_heart: | compile | 1m 21s | | trunk passed | | +1 :green_heart: | checkstyle | 1m 8s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 32s | | trunk passed | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed | | +1 :green_heart: | spotbugs | 3m 43s | | trunk passed | | +1 :green_heart: | shadedclient | 22m 52s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 25s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 54s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 25s | | the patch passed | | +1 :green_heart: | javadoc | 1m 20s | | the patch passed | | +1 :green_heart: | spotbugs | 3m 21s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 2s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 258m 9s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 52s | | The patch does not generate ASF License warnings. | | | | 400m 10s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/8/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4367 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c020b276eba7 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / f08e25d23aa96705511da6358769b81a4a711080 | | Default Java | Red Hat, Inc.-1.8.0_332-b09 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/8/testReport/ | | Max. process+thread count | 3757 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/8/console | | versions | git=2.9.5 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. Issue Time Tracking --- Worklog Id: (was: 781994) Time Spent: 4h 40m (was: 4.5h) > Deadlock on DataNode > > > Key: HDFS-16600 > URL: https://issues.apache.org/jira/browse/HDFS-16600 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 4h 40m > Remaining Estimate: 0h > > The UT > org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction > failed, because happened deadlock, which is introduced by > [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. > DeadLock: >
[jira] [Work logged] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks
[ https://issues.apache.org/jira/browse/HDFS-16613?focusedWorklogId=781984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781984 ] ASF GitHub Bot logged work on HDFS-16613: - Author: ASF GitHub Bot Created on: 16/Jun/22 09:37 Start Date: 16/Jun/22 09:37 Worklog Time Spent: 10m Work Description: lfxy commented on PR #4398: URL: https://github.com/apache/hadoop/pull/4398#issuecomment-1157448224 @hi-adachi . Excuse me, what is the next process? Will this PR be merged into the trunk branch? Issue Time Tracking --- Worklog Id: (was: 781984) Time Spent: 1h 50m (was: 1h 40m) > EC: Improve performance of decommissioning dn with many ec blocks > - > > Key: HDFS-16613 > URL: https://issues.apache.org/jira/browse/HDFS-16613 > Project: Hadoop HDFS > Issue Type: Improvement > Components: ec, erasure-coding, namenode >Affects Versions: 3.4.0 >Reporter: caozhiqiang >Assignee: caozhiqiang >Priority: Major > Labels: pull-request-available > Attachments: image-2022-06-07-11-46-42-389.png, > image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, > image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, > image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. > The reason is unlike replication blocks can be replicated from any dn which > has the same block replication, the ec block have to be replicated from the > decommissioning dn. > The configurations dfs.namenode.replication.max-streams and > dfs.namenode.replication.max-streams-hard-limit will limit the replication > speed, but increase these configurations will create risk to the whole > cluster's network. So it should add a new configuration to limit the > decommissioning dn, distinguished from the cluster wide max-streams limit. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=781960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781960 ] ASF GitHub Bot logged work on HDFS-13522: - Author: ASF GitHub Bot Created on: 16/Jun/22 08:34 Start Date: 16/Jun/22 08:34 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4311: URL: https://github.com/apache/hadoop/pull/4311#issuecomment-1157387445 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 42s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 14m 39s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 26m 42s | | trunk passed | | +1 :green_heart: | compile | 26m 34s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 23m 28s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 4m 58s | | trunk passed | | +1 :green_heart: | mvnsite | 7m 39s | | trunk passed | | -1 :x: | javadoc | 1m 36s | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/7/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in trunk failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 6m 37s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 12m 53s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 4s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 33s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 4m 30s | | the patch passed | | +1 :green_heart: | compile | 24m 50s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 24m 50s | | the patch passed | | +1 :green_heart: | compile | 22m 57s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 22m 57s | | the patch passed | | -1 :x: | blanks | 0m 1s | [/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/7/artifact/out/blanks-eol.txt) | The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix <>. Refer https://git-scm.com/docs/git-apply | | -0 :warning: | checkstyle | 4m 18s | [/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/7/artifact/out/results-checkstyle-root.txt) | root: The patch generated 3 new + 198 unchanged - 1 fixed = 201 total (was 199) | | +1 :green_heart: | mvnsite | 7m 20s | | the patch passed | | -1 :x: | javadoc | 1m 41s | [/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/7/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 6m 28s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 13m 28s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 42s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 25s | | hadoop-common in the patch passed. | | +1 :green_heart: | unit | 3m 9s | | hadoop-hdfs-client in the patch passed. | | +1 :green_heart: | unit | 258m 53s | | hadoop-hdfs in the patch passed. | | -1 :x: | unit | 23m 32s |
[jira] [Work logged] (HDFS-16616) Remove the use if Sets#newHashSet and Sets#newTreeSet
[ https://issues.apache.org/jira/browse/HDFS-16616?focusedWorklogId=781949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781949 ] ASF GitHub Bot logged work on HDFS-16616: - Author: ASF GitHub Bot Created on: 16/Jun/22 07:09 Start Date: 16/Jun/22 07:09 Worklog Time Spent: 10m Work Description: Samrat002 commented on PR #4400: URL: https://github.com/apache/hadoop/pull/4400#issuecomment-1157313860 - hadoop-hdfs in trunk failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. failed for `trunk` and the `patch` - All Test passed . Please review the pr Thanks Issue Time Tracking --- Worklog Id: (was: 781949) Time Spent: 50m (was: 40m) > Remove the use if Sets#newHashSet and Sets#newTreeSet > -- > > Key: HDFS-16616 > URL: https://issues.apache.org/jira/browse/HDFS-16616 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Samrat Deb >Assignee: Samrat Deb >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > As part of removing guava dependencies HADOOP-17115, HADOOP-17721, > HADOOP-17722 and HADOOP-17720 are fixed, > Currently the code call util function to create HashSet and TreeSet in the > repo . These function calls dont have much importance as it is calling > internally new HashSet<> / new TreeSet<> from java.utils > This task is to clean up all the function calls to create sets which is > redundant > Before moving to java8 , sets were created using guava functions and API , > now since this is moved away and util code in the hadoop now looks like > 1. > public static TreeSet newTreeSet() { return new > TreeSet(); > 2. > public static HashSet newHashSet() > { return new HashSet(); } > These interfaces dont do anything much just a extra layer of function call > please refer to the task > https://issues.apache.org/jira/browse/HADOOP-17726 > Can anyone review if this ticket add some value in the code. > Looking forward to some input/ thoughts . If not adding any value we can > close it and not move forward with changes ! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Work logged] (HDFS-16616) Remove the use if Sets#newHashSet and Sets#newTreeSet
[ https://issues.apache.org/jira/browse/HDFS-16616?focusedWorklogId=781938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781938 ] ASF GitHub Bot logged work on HDFS-16616: - Author: ASF GitHub Bot Created on: 16/Jun/22 06:37 Start Date: 16/Jun/22 06:37 Worklog Time Spent: 10m Work Description: hadoop-yetus commented on PR #4400: URL: https://github.com/apache/hadoop/pull/4400#issuecomment-1157289843 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 22s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 10 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 15m 8s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 29m 32s | | trunk passed | | +1 :green_heart: | compile | 7m 29s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 8m 38s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 42s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 0s | | trunk passed | | -1 :x: | javadoc | 2m 1s | [/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4400/3/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in trunk failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 2m 56s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 5m 57s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 5s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 29s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 13s | | the patch passed | | +1 :green_heart: | compile | 7m 27s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 7m 27s | | the patch passed | | +1 :green_heart: | compile | 7m 28s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 7m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 35s | | hadoop-hdfs-project: The patch generated 0 new + 385 unchanged - 1 fixed = 385 total (was 386) | | +1 :green_heart: | mvnsite | 2m 47s | | the patch passed | | -1 :x: | javadoc | 1m 18s | [/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4400/3/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt) | hadoop-hdfs in the patch failed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. | | +1 :green_heart: | javadoc | 2m 42s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 5m 57s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 52s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 456m 39s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | unit | 37m 7s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 1m 18s | | The patch does not generate ASF License warnings. | | | | 660m 25s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4400/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4400 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely
[ https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=781930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781930 ] ASF GitHub Bot logged work on HDFS-16064: - Author: ASF GitHub Bot Created on: 16/Jun/22 06:25 Start Date: 16/Jun/22 06:25 Worklog Time Spent: 10m Work Description: ZanderXu commented on PR #4410: URL: https://github.com/apache/hadoop/pull/4410#issuecomment-1157281252 @KevinWikant Nice catch +1. I learned a lot from it. thanks~ Issue Time Tracking --- Worklog Id: (was: 781930) Time Spent: 1h (was: 50m) > HDFS-721 causes DataNode decommissioning to get stuck indefinitely > -- > > Key: HDFS-16064 > URL: https://issues.apache.org/jira/browse/HDFS-16064 > Project: Hadoop HDFS > Issue Type: Bug > Components: datanode, namenode >Affects Versions: 3.2.1 >Reporter: Kevin Wikant >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a > non-issue under the assumption that if the namenode & a datanode get into an > inconsistent state for a given block pipeline, there should be another > datanode available to replicate the block to > While testing datanode decommissioning using "dfs.exclude.hosts", I have > encountered a scenario where the decommissioning gets stuck indefinitely > Below is the progression of events: > * there are initially 4 datanodes DN1, DN2, DN3, DN4 > * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts" > * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in > order to satisfy their minimum replication factor of 2 > * during this replication process > https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes > the following inconsistent state: > ** DN3 thinks it has the block pipeline in FINALIZED state > ** the namenode does not think DN3 has the block pipeline > {code:java} > 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode > (DataXceiver for client at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): > DN3:9866:DataXceiver error processing WRITE_BLOCK operation src: /DN2:45654 > dst: /DN3:9866; > org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block > BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created. > {code} > * the replication is attempted again, but: > ** DN4 has the block > ** DN1 and/or DN2 have the block, but don't count towards the minimum > replication factor because they are being decommissioned > ** DN3 does not have the block & cannot have the block replicated to it > because of HDFS-721 > * the namenode repeatedly tries to replicate the block to DN3 & repeatedly > fails, this continues indefinitely > * therefore DN4 is the only live datanode with the block & the minimum > replication factor of 2 cannot be satisfied > * because the minimum replication factor cannot be satisfied for the > block(s) being moved off DN1 & DN2, the datanode decommissioning can never be > completed > {code:java} > 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > ... > 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): > Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, > decommissioned replicas: 0, decommissioning replicas: 2, maintenance > replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is > Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , > Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is > current datanode entering maintenance: false > {code} > Being stuck in decommissioning state forever is not an intended behavior of > DataNode decommissioning > A few potential solutions: > * Address the root cause of the problem which is an inconsistent state > between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721 > * Detect when datanode decommissioning is stuck due to lack of available > datanodes for satisfying the minimum replication factor, then recover by > re-enabling the datanodes being decommissioned > -- This message was sent by Atlassian Jira (v8.20.7#820007)