[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782238=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782238
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 03:26
Start Date: 17/Jun/22 03:26
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4127:
URL: https://github.com/apache/hadoop/pull/4127#issuecomment-1158450996

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 58s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 12 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  40m 48s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  24m 57s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  23m  4s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |  20m 33s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   4m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   7m 50s |  |  trunk passed  |
   | -1 :x: |  javadoc  |   1m 45s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/15/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   6m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  12m 26s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 33s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 30s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 13s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |  22m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 29s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  20m 29s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/15/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   4m 20s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/15/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 3 new + 339 unchanged - 1 fixed = 342 total (was 
340)  |
   | +1 :green_heart: |  mvnsite  |   7m 40s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 45s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/15/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in the patch failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   6m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  12m 56s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 49s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 33s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   3m 16s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 413m 33s |  |  hadoop-hdfs in the patch 
passed.  |
   | -1 :x: |  unit  |   2m 11s | 

[jira] [Work logged] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16613?focusedWorklogId=782228=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782228
 ]

ASF GitHub Bot logged work on HDFS-16613:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 02:46
Start Date: 17/Jun/22 02:46
Worklog Time Spent: 10m 
  Work Description: lfxy closed pull request #4391: HDFS-16613. EC: Improve 
performance of decommissioning dn with many ec blocks
URL: https://github.com/apache/hadoop/pull/4391




Issue Time Tracking
---

Worklog Id: (was: 782228)
Time Spent: 2.5h  (was: 2h 20m)

> EC: Improve performance of decommissioning dn with many ec blocks
> -
>
> Key: HDFS-16613
> URL: https://issues.apache.org/jira/browse/HDFS-16613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding, namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-06-07-11-46-42-389.png, 
> image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, 
> image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, 
> image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. 
> The reason is unlike replication blocks can be replicated from any dn which 
> has the same block replication, the ec block have to be replicated from the 
> decommissioning dn.
> The configurations dfs.namenode.replication.max-streams and 
> dfs.namenode.replication.max-streams-hard-limit will limit the replication 
> speed, but increase these configurations will create risk to the whole 
> cluster's network. So it should add a new configuration to limit the 
> decommissioning dn, distinguished from the cluster wide max-streams limit.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16613?focusedWorklogId=782227=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782227
 ]

ASF GitHub Bot logged work on HDFS-16613:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 02:46
Start Date: 17/Jun/22 02:46
Worklog Time Spent: 10m 
  Work Description: lfxy commented on PR #4398:
URL: https://github.com/apache/hadoop/pull/4398#issuecomment-1158429085

   @hi-adachi OK, I see, thank you very much!




Issue Time Tracking
---

Worklog Id: (was: 782227)
Time Spent: 2h 20m  (was: 2h 10m)

> EC: Improve performance of decommissioning dn with many ec blocks
> -
>
> Key: HDFS-16613
> URL: https://issues.apache.org/jira/browse/HDFS-16613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding, namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-06-07-11-46-42-389.png, 
> image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, 
> image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, 
> image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. 
> The reason is unlike replication blocks can be replicated from any dn which 
> has the same block replication, the ec block have to be replicated from the 
> decommissioning dn.
> The configurations dfs.namenode.replication.max-streams and 
> dfs.namenode.replication.max-streams-hard-limit will limit the replication 
> speed, but increase these configurations will create risk to the whole 
> cluster's network. So it should add a new configuration to limit the 
> decommissioning dn, distinguished from the cluster wide max-streams limit.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16613?focusedWorklogId=782197=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782197
 ]

ASF GitHub Bot logged work on HDFS-16613:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 01:02
Start Date: 17/Jun/22 01:02
Worklog Time Spent: 10m 
  Work Description: hi-adachi commented on PR #4398:
URL: https://github.com/apache/hadoop/pull/4398#issuecomment-1158359483

   @lfxy The PR was merged, this is just FYI, the contribution guide says as 
follows. Thank you for your contribution.
   
   > https://cwiki.apache.org/confluence/display/hadoop/how+to+contribute
   > Once a "+1" comment is received from the automated patch testing system 
and a code reviewer has set the Reviewed flag on the issue's Jira, a committer 
should then evaluate it within a few days and either: commit it; or reject it 
with an explanation.  
   > 
   > Please be patient. Committers are busy people too. If no one responds to 
your patch after a few days, please make friendly reminders. Please incorporate 
other's suggestions into your patch if you think they're reasonable. Finally, 
remember that even a patch that is not committed is useful to the community.




Issue Time Tracking
---

Worklog Id: (was: 782197)
Time Spent: 2h 10m  (was: 2h)

> EC: Improve performance of decommissioning dn with many ec blocks
> -
>
> Key: HDFS-16613
> URL: https://issues.apache.org/jira/browse/HDFS-16613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding, namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-06-07-11-46-42-389.png, 
> image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, 
> image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, 
> image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. 
> The reason is unlike replication blocks can be replicated from any dn which 
> has the same block replication, the ec block have to be replicated from the 
> decommissioning dn.
> The configurations dfs.namenode.replication.max-streams and 
> dfs.namenode.replication.max-streams-hard-limit will limit the replication 
> speed, but increase these configurations will create risk to the whole 
> cluster's network. So it should add a new configuration to limit the 
> decommissioning dn, distinguished from the cluster wide max-streams limit.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16634?focusedWorklogId=782203=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782203
 ]

ASF GitHub Bot logged work on HDFS-16634:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 01:13
Start Date: 17/Jun/22 01:13
Worklog Time Spent: 10m 
  Work Description: virajjasani opened a new pull request, #4448:
URL: https://github.com/apache/hadoop/pull/4448

   ### Description of PR
   On a busy cluster, sometimes it takes bit of time for deleted node(from the 
cluster)'s "slow node report" to get removed from slow peer json report on 
Namenode JMX metrics. In the meantime, user should be able to browse through 
more entries in the report by adjusting i.e. reconfiguring 
"dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
without user having to bounce active Namenode just for this purpose.
   
   ### How was this patch tested?
   Dev cluster and using UT.
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   




Issue Time Tracking
---

Worklog Id: (was: 782203)
Remaining Estimate: 0h
Time Spent: 10m

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16634:
--
Labels: pull-request-available  (was: )

> Dynamically adjust slow peer report size on JMX metrics
> ---
>
> Key: HDFS-16634
> URL: https://issues.apache.org/jira/browse/HDFS-16634
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> On a busy cluster, sometimes it takes bit of time for deleted node(from the 
> cluster)'s "slow node report" to get removed from slow peer json report on 
> Namenode JMX metrics. In the meantime, user should be able to browse through 
> more entries in the report by adjusting i.e. reconfiguring 
> "dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
> without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16634) Dynamically adjust slow peer report size on JMX metrics

2022-06-16 Thread Viraj Jasani (Jira)
Viraj Jasani created HDFS-16634:
---

 Summary: Dynamically adjust slow peer report size on JMX metrics
 Key: HDFS-16634
 URL: https://issues.apache.org/jira/browse/HDFS-16634
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Viraj Jasani
Assignee: Viraj Jasani


On a busy cluster, sometimes it takes bit of time for deleted node(from the 
cluster)'s "slow node report" to get removed from slow peer json report on 
Namenode JMX metrics. In the meantime, user should be able to browse through 
more entries in the report by adjusting i.e. reconfiguring 
"dfs.datanode.max.nodes.to.report" so that the list size can be adjusted 
without user having to bounce active Namenode just for this purpose.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782198=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782198
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 17/Jun/22 01:06
Start Date: 17/Jun/22 01:06
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#issuecomment-1158361679

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 37s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  44m 39s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  24m 44s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 49s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |  20m 27s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   4m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   7m 44s |  |  trunk passed  |
   | -1 :x: |  javadoc  |   1m 45s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/8/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   6m 50s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  12m 22s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 27s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 34s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 10s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m  3s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |  22m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 26s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  20m 26s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/8/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   4m 13s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/8/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 3 new + 198 unchanged - 1 fixed = 201 total (was 
199)  |
   | +1 :green_heart: |  mvnsite  |   7m 42s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 45s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/8/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in the patch failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   6m 49s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  12m 54s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 42s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   3m 14s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | -1 :x: |  unit  | 256m 19s | 

[jira] [Work logged] (HDFS-16566) Erasure Coding: Recovery may causes excess replicas when busy DN exsits

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16566?focusedWorklogId=782183=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782183
 ]

ASF GitHub Bot logged work on HDFS-16566:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 22:58
Start Date: 16/Jun/22 22:58
Worklog Time Spent: 10m 
  Work Description: jojochuang commented on code in PR #4252:
URL: https://github.com/apache/hadoop/pull/4252#discussion_r899627843


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/protocolPB/PBHelper.java:
##
@@ -1040,11 +1040,16 @@ public static BlockECReconstructionInfo 
convertBlockECReconstructionInfo(
 
 byte[] liveBlkIndices = 
blockEcReconstructionInfoProto.getLiveBlockIndices()
 .toByteArray();
+byte[] excludeReconstructedIndices =

Review Comment:
   Please check and make sure ExcludeReconstructedIndices is filled.



##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/proto/erasurecoding.proto:
##
@@ -108,6 +108,7 @@ message BlockECReconstructionInfoProto {
   required StorageTypesProto targetStorageTypes = 5;
   required bytes liveBlockIndices = 6;
   required ErasureCodingPolicyProto ecPolicy = 7;
+  required bytes excludeReconstructedIndices = 8;

Review Comment:
   ```suggestion
 optional bytes excludeReconstructedIndices = 8;
   ```
   Make it optional to ensure backward compatibility.





Issue Time Tracking
---

Worklog Id: (was: 782183)
Time Spent: 3.5h  (was: 3h 20m)

> Erasure Coding: Recovery may causes excess replicas when busy DN exsits
> ---
>
> Key: HDFS-16566
> URL: https://issues.apache.org/jira/browse/HDFS-16566
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.2
>Reporter: Ruinan Gu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> Simple case:
> RS3-2 ,[0(busy),2,3,4] (1 missing),0 is busy.
> We can get liveblockIndice=[2,3,4], additionalRepl=1.So the DN will get the 
> LiveBitSet=[2,3,4] and targets.length=1.
> According to StripedWriter.initTargetIndices(), 0 will get recovered instead 
> of 1. So the internal blocks will become [0(busy),2,3,4,0'(excess)].Although 
> NN will detect, delete the excess replicas and recover the missing block(1) 
> correctly after the wrong recovery of 0', I don't think this process is 
> expected and the recovery of 0' is obviously wrong and not necessary.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16591) StateStoreZooKeeper fails to initialize

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16591?focusedWorklogId=782176=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782176
 ]

ASF GitHub Bot logged work on HDFS-16591:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 21:55
Start Date: 16/Jun/22 21:55
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4447:
URL: https://github.com/apache/hadoop/pull/4447#issuecomment-1158170997

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 42s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 41s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  26m 49s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  22m 56s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |  20m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 48s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   4m 50s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   4m 21s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   3m 48s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   6m  9s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 49s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 32s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m  7s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 58s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |  21m 58s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  20m 33s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m 49s | 
[/results-checkstyle-hadoop-common-project.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4447/1/artifact/out/results-checkstyle-hadoop-common-project.txt)
 |  hadoop-common-project: The patch generated 5 new + 203 unchanged - 2 fixed 
= 208 total (was 205)  |
   | +1 :green_heart: |  mvnsite  |   4m 31s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   4m  8s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   3m 43s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | -1 :x: |  spotbugs  |   1m 39s | 
[/new-spotbugs-hadoop-common-project_hadoop-auth.html](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4447/1/artifact/out/new-spotbugs-hadoop-common-project_hadoop-auth.html)
 |  hadoop-common-project/hadoop-auth generated 1 new + 0 unchanged - 0 fixed = 
1 total (was 0)  |
   | +1 :green_heart: |  shadedclient  |  21m 56s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |   4m  0s |  |  hadoop-auth in the patch 
passed.  |
   | +1 :green_heart: |  unit  |  18m 28s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   1m 57s |  |  hadoop-registry in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 34s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 247m 46s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | SpotBugs | module:hadoop-common-project/hadoop-auth |
   |  |  Write to static field 
org.apache.hadoop.security.authentication.util.JaasConfiguration.entry from 
instance method new 
org.apache.hadoop.security.authentication.util.JaasConfiguration(String, 
String, String)  At JaasConfiguration.java:from instance method new 
org.apache.hadoop.security.authentication.util.JaasConfiguration(String, 
String, String)  At 

[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=782169=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782169
 ]

ASF GitHub Bot logged work on HDFS-16064:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 21:37
Start Date: 16/Jun/22 21:37
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4410:
URL: https://github.com/apache/hadoop/pull/4410#issuecomment-1158158938

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 56s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  39m 25s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 39s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  trunk passed  |
   | -1 :x: |  javadoc  |   1m 20s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 58s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 30s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  2s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 1 new + 100 unchanged 
- 0 fixed = 101 total (was 100)  |
   | +1 :green_heart: |  mvnsite  |   1m 28s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m  0s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in the patch failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 35s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m  0s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 381m 44s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  1s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 498m 26s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4410/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4410 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux efcbee072994 4.15.0-166-generic 

[jira] [Work logged] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16613?focusedWorklogId=782142=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782142
 ]

ASF GitHub Bot logged work on HDFS-16613:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 18:11
Start Date: 16/Jun/22 18:11
Worklog Time Spent: 10m 
  Work Description: jojochuang merged PR #4398:
URL: https://github.com/apache/hadoop/pull/4398




Issue Time Tracking
---

Worklog Id: (was: 782142)
Time Spent: 2h  (was: 1h 50m)

> EC: Improve performance of decommissioning dn with many ec blocks
> -
>
> Key: HDFS-16613
> URL: https://issues.apache.org/jira/browse/HDFS-16613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding, namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-06-07-11-46-42-389.png, 
> image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, 
> image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, 
> image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. 
> The reason is unlike replication blocks can be replicated from any dn which 
> has the same block replication, the ec block have to be replicated from the 
> decommissioning dn.
> The configurations dfs.namenode.replication.max-streams and 
> dfs.namenode.replication.max-streams-hard-limit will limit the replication 
> speed, but increase these configurations will create risk to the whole 
> cluster's network. So it should add a new configuration to limit the 
> decommissioning dn, distinguished from the cluster wide max-streams limit.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

2022-06-16 Thread Wei-Chiu Chuang (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HDFS-16613:
---
Fix Version/s: 3.4.0
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> EC: Improve performance of decommissioning dn with many ec blocks
> -
>
> Key: HDFS-16613
> URL: https://issues.apache.org/jira/browse/HDFS-16613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding, namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2022-06-07-11-46-42-389.png, 
> image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, 
> image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, 
> image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. 
> The reason is unlike replication blocks can be replicated from any dn which 
> has the same block replication, the ec block have to be replicated from the 
> decommissioning dn.
> The configurations dfs.namenode.replication.max-streams and 
> dfs.namenode.replication.max-streams-hard-limit will limit the replication 
> speed, but increase these configurations will create risk to the whole 
> cluster's network. So it should add a new configuration to limit the 
> decommissioning dn, distinguished from the cluster wide max-streams limit.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16591) StateStoreZooKeeper fails to initialize

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16591:
--
Labels: pull-request-available  (was: )

> StateStoreZooKeeper fails to initialize
> ---
>
> Key: HDFS-16591
> URL: https://issues.apache.org/jira/browse/HDFS-16591
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Hector Sandoval Chaverri
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MembershipStore and MountTableStore are failing to initialize, logging the 
> following errors on the Router logs:
> {noformat}
> 2022-05-23 16:43:01,156 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService: 
> Cannot get version for class 
> org.apache.hadoop.hdfs.server.federation.store.MembershipStore
> org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException:
>  Cached State Store not initialized, MembershipState records not valid
>   at 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.checkCacheAvailable(CachedRecordStore.java:106)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.getCachedRecords(CachedRecordStore.java:227)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.getStateStoreVersion(RouterHeartbeatService.java:131)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.updateStateStore(RouterHeartbeatService.java:92)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.periodicInvoke(RouterHeartbeatService.java:159)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748){noformat}
> After investigating, we noticed that ZKDelegationTokenSecretManager normally 
> initializes properties for ZooKeeper clients to connect using SASL/Kerberos. 
> If ZKDelegationTokenSecretManager is replaced with a new SecretManager, the 
> SASL properties don't get configured and any StateStores that connect to 
> ZooKeeper fail with the above error. 
>  A potential way to fix this is by setting the JaasConfiguration (currently 
> done in ZKDelegationTokenSecretManager) as part of the 
> StateStoreZooKeeperImpl initialization method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16591) StateStoreZooKeeper fails to initialize

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16591?focusedWorklogId=782128=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782128
 ]

ASF GitHub Bot logged work on HDFS-16591:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 17:46
Start Date: 16/Jun/22 17:46
Worklog Time Spent: 10m 
  Work Description: hchaverri opened a new pull request, #4447:
URL: https://github.com/apache/hadoop/pull/4447

   …enabled
   
   
   
   ### Description of PR
   Setting up the JaasConfiguration when creating a new ZKCuratorManager, to 
allow ZK connections via SASL.
   Also removing duplicated classes of JaasConfiguration.
   
   ### How was this patch tested?
   Ran the following unit tests:
   TestJaasConfiguration
   TestZKCuratorManager
   TestZKSignerSecretProvider
   TestZKDelegationTokenSecretManager
   TestMicroZookeeperService
   
   Created a TestDelegationTokenSecretManager to replace the default 
ZKDelegationTokenSecretManagerImpl and deployed to an RBF router. Without these 
changes, the router initialization will fail with the error described on 
HDFS-16591. Initialization succeeds with this patch.
   
   ### For code changes:
   
   - [ ] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [ ] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




Issue Time Tracking
---

Worklog Id: (was: 782128)
Remaining Estimate: 0h
Time Spent: 10m

> StateStoreZooKeeper fails to initialize
> ---
>
> Key: HDFS-16591
> URL: https://issues.apache.org/jira/browse/HDFS-16591
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Reporter: Hector Sandoval Chaverri
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> MembershipStore and MountTableStore are failing to initialize, logging the 
> following errors on the Router logs:
> {noformat}
> 2022-05-23 16:43:01,156 ERROR 
> org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService: 
> Cannot get version for class 
> org.apache.hadoop.hdfs.server.federation.store.MembershipStore
> org.apache.hadoop.hdfs.server.federation.store.StateStoreUnavailableException:
>  Cached State Store not initialized, MembershipState records not valid
>   at 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.checkCacheAvailable(CachedRecordStore.java:106)
>   at 
> org.apache.hadoop.hdfs.server.federation.store.CachedRecordStore.getCachedRecords(CachedRecordStore.java:227)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.getStateStoreVersion(RouterHeartbeatService.java:131)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.updateStateStore(RouterHeartbeatService.java:92)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.RouterHeartbeatService.periodicInvoke(RouterHeartbeatService.java:159)
>   at 
> org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:748){noformat}
> After investigating, we noticed that ZKDelegationTokenSecretManager normally 
> initializes properties for ZooKeeper clients to connect using SASL/Kerberos. 
> If ZKDelegationTokenSecretManager is replaced with a new SecretManager, the 
> SASL properties don't get configured and any StateStores that connect to 
> ZooKeeper fail with the above error. 
>  A potential way to fix this is by setting the JaasConfiguration (currently 
> done in ZKDelegationTokenSecretManager) as part of the 
> StateStoreZooKeeperImpl initialization method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To 

[jira] [Work logged] (HDFS-16605) Improve Code With Lambda in hadoop-hdfs-rbf moudle

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16605?focusedWorklogId=782101=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782101
 ]

ASF GitHub Bot logged work on HDFS-16605:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 16:15
Start Date: 16/Jun/22 16:15
Worklog Time Spent: 10m 
  Work Description: slfan1989 commented on PR #4375:
URL: https://github.com/apache/hadoop/pull/4375#issuecomment-1157862822

   @goiri Can you help me merge this pr to trunk branch? Thanks for helping me 
review the code!




Issue Time Tracking
---

Worklog Id: (was: 782101)
Time Spent: 2h 20m  (was: 2h 10m)

> Improve Code With Lambda in hadoop-hdfs-rbf moudle
> --
>
> Key: HDFS-16605
> URL: https://issues.apache.org/jira/browse/HDFS-16605
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.4.0
>Reporter: fanshilun
>Assignee: fanshilun
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782080=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782080
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 15:25
Start Date: 16/Jun/22 15:25
Worklog Time Spent: 10m 
  Work Description: simbadzina commented on PR #4441:
URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157792454

   @ZanderXu is https://github.com/apache/hadoop/pull/4127 I have 
configurations on both the router and client side. Consistency is also 
guaranteed because the router always does an msync.
   The reason for the client side configuration is for latency sensitive 
clients that just want one call between the router and the namenodes.




Issue Time Tracking
---

Worklog Id: (was: 782080)
Time Spent: 14h  (was: 13h 50m)

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 14h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782073=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782073
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 15:18
Start Date: 16/Jun/22 15:18
Worklog Time Spent: 10m 
  Work Description: simbadzina commented on PR #4441:
URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157784867

   > Thanks @zhengchenyu and @simbadzina .
   > 
   > > I think config in client side may be more flexible.
   > 
   > This is a very meaningful topic. If only the client controls whether or 
not to enable ObserverRead will be more difficult for Admin to control, because 
it is very difficult to upgrade the HDFS client in full. In other words: If RBF 
controls whether the ObserverRead is enabled, the Admin will be very convenient 
to control the ObserverRead of the entire cluster, and even dynamically control 
whether the ObserverRead of a single NS or the entire cluster is enabled. But 
there may be some special Client that do not want to enable ObserverRead, so 
RBF should identify those requests and proxy them to the Active Namenode.
   > 
   > @simbadzina This is why dynamic updates are required, so that when Admin 
finds that there are some abnormal Observer NameNodes, he/she can quickly 
disable the ObserverRead of one NS or even all NSs.
   > 
   > > In our draft design, after apply 
[HDFS-13522](https://issues.apache.org/jira/browse/HDFS-13522).002.patch, I 
wanna proxy client's state id.
   > 
   > Proxying client's state id to the NameNode by RBF will be very complicated.
   > 
   > * A DFSClient may read or write some paths of different NameServices, and 
the stateID of different NS may be different.
   > * The client does not know the Nameservice to which the reading or writing 
path belong, so it cannot pass the state id to RBF.
   
   @ZanderXu in my full PR, https://github.com/apache/hadoop/pull/4127, I do 
also allow routers to enable and disable observer reads. The difference being 
that it requires a router restart. Since routers are stateless this is a quick 
operation. At most one minute.




Issue Time Tracking
---

Worklog Id: (was: 782073)
Time Spent: 13h 50m  (was: 13h 40m)

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782059=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782059
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 14:01
Start Date: 16/Jun/22 14:01
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4441:
URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157696846

   > We know observer can not guarantee strong consistency, maybe some use have 
high demand, they wanna disable observe read, though few user have this demand.
   
   Only a very small number of users have high demand, and in most cases, the 
client enables ObserverRead default. In  other words: In most cases, there is 
no need for client to pass the ObserverRead enable flag to RBF. So only a very 
small number of requests need to carry a specific flag bit to RBF, so that the 
RBF can force an msync to ensure the consistency before proxying the request.
   
   There are serval methods for the client side to carry the force consistency 
flag to RBF:
   1. Carry a special StateID to RBF, such as -100 (Client Process level)
   2. Carry a special filed attributes to RBF through CallerContext (single RPC 
level)
   3. etc..




Issue Time Tracking
---

Worklog Id: (was: 782059)
Time Spent: 13h 40m  (was: 13.5h)

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 13h 40m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782049=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782049
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 13:37
Start Date: 16/Jun/22 13:37
Worklog Time Spent: 10m 
  Work Description: zhengchenyu commented on PR #4441:
URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157672472

   > Thanks @zhengchenyu and @simbadzina .
   > 
   > > I think config in client side may be more flexible.
   > 
   > This is a very meaningful topic. If only the client controls whether or 
not to enable ObserverRead will be more difficult for Admin to control, because 
it is very difficult to upgrade the HDFS client in full. In other words: If RBF 
controls whether the ObserverRead is enabled, the Admin will be very convenient 
to control the ObserverRead of the entire cluster, and even dynamically control 
whether the ObserverRead of a single NS or the entire cluster is enabled. But 
there may be some special Client that do not want to enable ObserverRead, so 
RBF should identify those requests and proxy them to the Active Namenode.
   > 
   > @simbadzina This is why dynamic updates are required, so that when Admin 
finds that there are some abnormal Observer NameNodes, he/she can quickly 
disable the ObserverRead of one NS or even all NSs.
   > 
   > > In our draft design, after apply 
[HDFS-13522](https://issues.apache.org/jira/browse/HDFS-13522).002.patch, I 
wanna proxy client's state id.
   > 
   > Proxying client's state id to the NameNode by RBF will be very complicated.
   > 
   > * A DFSClient may read or write some paths of different NameServices, and 
the stateID of different NS may be different.
   > * The client does not know the Nameservice to which the reading or writing 
path belong, so it cannot pass the state id to RBF.
   
   Yes, you are right in some condition. If all client are common user, for 
hive and mr application, it is right. 
   We know observer can not guarantee strong consistency, maybe some use have 
high demand, they could wanna disable observe read, though few user have this 
demand.
   
   Maybe we can reserve configuration both on router side and client side.
   
   Yes, Proxying client's state id is complicated.  I don't know whether it is 
necessary or not. So just delay it.
   




Issue Time Tracking
---

Worklog Id: (was: 782049)
Time Spent: 13.5h  (was: 13h 20m)

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 13.5h
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=782040=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782040
 ]

ASF GitHub Bot logged work on HDFS-16064:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 13:20
Start Date: 16/Jun/22 13:20
Worklog Time Spent: 10m 
  Work Description: KevinWikant commented on PR #4410:
URL: https://github.com/apache/hadoop/pull/4410#issuecomment-1157654335

   @ashutoshcipher @aajisaka @ZanderXu really appreciate the reviews on this 
PR, thank you!
   
   @aajisaka I have removed the unused imports, please let me know if you have 
any other comments/concerns




Issue Time Tracking
---

Worklog Id: (was: 782040)
Time Spent: 1h 10m  (was: 1h)

> HDFS-721 causes DataNode decommissioning to get stuck indefinitely
> --
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum 

[jira] [Updated] (HDFS-16633) Reserved Space For Replicas is not released on some cases

2022-06-16 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated HDFS-16633:
-
Description: 
Have found the Reserved Space For Replicas is not released on some cases in a 
Cx Prod cluster. There are few fixes like HDFS-9530 and HDFS-8072 but still the 
issue is not completely fixed. Have tried to debug the root cause but this will 
take lot of time as it is Cx Prod Cluster. 

But we have an easier way to fix the issue completely by releasing any 
remaining reserved space from BlockReceiver#close which is initiated by 
DataXceiver#writeBlock finally. 



  was:
Have found the Reserved Space For Replicas is not released on a Cx Prod 
cluster. There are few fixes like HDFS-9530 and HDFS-8072 but still the issue 
is not completely fixed. Have tried to debug the root cause but this will take 
lot of time as it is Cx Prod Cluster. 

But we have an easier way to fix the issue completely by releasing any 
remaining reserved space from BlockReceiver#close which is initiated by 
DataXceiver#writeBlock finally. 




> Reserved Space For Replicas is not released on some cases
> -
>
> Key: HDFS-16633
> URL: https://issues.apache.org/jira/browse/HDFS-16633
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> Have found the Reserved Space For Replicas is not released on some cases in a 
> Cx Prod cluster. There are few fixes like HDFS-9530 and HDFS-8072 but still 
> the issue is not completely fixed. Have tried to debug the root cause but 
> this will take lot of time as it is Cx Prod Cluster. 
> But we have an easier way to fix the issue completely by releasing any 
> remaining reserved space from BlockReceiver#close which is initiated by 
> DataXceiver#writeBlock finally. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16633) Reserved Space For Replicas is not released on some cases

2022-06-16 Thread Prabhu Joseph (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prabhu Joseph updated HDFS-16633:
-
Summary: Reserved Space For Replicas is not released on some cases  (was: 
Reserved Space For Replicas is not released )

> Reserved Space For Replicas is not released on some cases
> -
>
> Key: HDFS-16633
> URL: https://issues.apache.org/jira/browse/HDFS-16633
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.1.2
>Reporter: Prabhu Joseph
>Assignee: Prabhu Joseph
>Priority: Major
>
> Have found the Reserved Space For Replicas is not released on a Cx Prod 
> cluster. There are few fixes like HDFS-9530 and HDFS-8072 but still the issue 
> is not completely fixed. Have tried to debug the root cause but this will 
> take lot of time as it is Cx Prod Cluster. 
> But we have an easier way to fix the issue completely by releasing any 
> remaining reserved space from BlockReceiver#close which is initiated by 
> DataXceiver#writeBlock finally. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16633) Reserved Space For Replicas is not released

2022-06-16 Thread Prabhu Joseph (Jira)
Prabhu Joseph created HDFS-16633:


 Summary: Reserved Space For Replicas is not released 
 Key: HDFS-16633
 URL: https://issues.apache.org/jira/browse/HDFS-16633
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 3.1.2
Reporter: Prabhu Joseph
Assignee: Prabhu Joseph


Have found the Reserved Space For Replicas is not released on a Cx Prod 
cluster. There are few fixes like HDFS-9530 and HDFS-8072 but still the issue 
is not completely fixed. Have tried to debug the root cause but this will take 
lot of time as it is Cx Prod Cluster. 

But we have an easier way to fix the issue completely by releasing any 
remaining reserved space from BlockReceiver#close which is initiated by 
DataXceiver#writeBlock finally. 





--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16581) Print node status when executing printTopology

2022-06-16 Thread Tao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tao Li resolved HDFS-16581.
---
Fix Version/s: 3.4.0
   3.3.4
   Resolution: Resolved

> Print node status when executing printTopology
> --
>
> Key: HDFS-16581
> URL: https://issues.apache.org/jira/browse/HDFS-16581
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsadmin, namenode
>Affects Versions: 3.3.0
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.4
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We can use the dfsadmin tool to see which DataNodes the cluster has, and some 
> of these nodes are alive, DECOMMISSIONED, or DECOMMISSION_INPROGRESS. It 
> would be helpful if we could get this information in a timely manner, such as 
> troubleshooting cluster failures, tracking node status, etc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16581) Print node status when executing printTopology

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16581?focusedWorklogId=782024=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782024
 ]

ASF GitHub Bot logged work on HDFS-16581:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 11:43
Start Date: 16/Jun/22 11:43
Worklog Time Spent: 10m 
  Work Description: tomscut commented on PR #4321:
URL: https://github.com/apache/hadoop/pull/4321#issuecomment-1157563468

   Thanks @jianghuazhu for your contribution. Thanks @virajjasani for your 
review.




Issue Time Tracking
---

Worklog Id: (was: 782024)
Time Spent: 3h  (was: 2h 50m)

> Print node status when executing printTopology
> --
>
> Key: HDFS-16581
> URL: https://issues.apache.org/jira/browse/HDFS-16581
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsadmin, namenode
>Affects Versions: 3.3.0
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> We can use the dfsadmin tool to see which DataNodes the cluster has, and some 
> of these nodes are alive, DECOMMISSIONED, or DECOMMISSION_INPROGRESS. It 
> would be helpful if we could get this information in a timely manner, such as 
> troubleshooting cluster failures, tracking node status, etc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16581) Print node status when executing printTopology

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16581?focusedWorklogId=782018=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782018
 ]

ASF GitHub Bot logged work on HDFS-16581:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 11:19
Start Date: 16/Jun/22 11:19
Worklog Time Spent: 10m 
  Work Description: tomscut merged PR #4321:
URL: https://github.com/apache/hadoop/pull/4321




Issue Time Tracking
---

Worklog Id: (was: 782018)
Time Spent: 2h 50m  (was: 2h 40m)

> Print node status when executing printTopology
> --
>
> Key: HDFS-16581
> URL: https://issues.apache.org/jira/browse/HDFS-16581
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: dfsadmin, namenode
>Affects Versions: 3.3.0
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> We can use the dfsadmin tool to see which DataNodes the cluster has, and some 
> of these nodes are alive, DECOMMISSIONED, or DECOMMISSION_INPROGRESS. It 
> would be helpful if we could get this information in a timely manner, such as 
> troubleshooting cluster failures, tracking node status, etc.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782013=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782013
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 11:04
Start Date: 16/Jun/22 11:04
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4441:
URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157530100

   As in my draft PR above, RBF always updates lastSeenTxid from Active and 
saves. When an NS enable ObserverRead, RBF will set the stored lastSeenTxid of 
this NS to the RPC header and bring it to the Observer NameNode; if the NS 
disable ObserverRead, RBF will not set the stated id in RPC header, so even if 
the request is passed to the Observer, the Observer will also returns 
StandbyException.




Issue Time Tracking
---

Worklog Id: (was: 782013)
Time Spent: 13h 20m  (was: 13h 10m)

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 13h 20m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=782009=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-782009
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 10:58
Start Date: 16/Jun/22 10:58
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4441:
URL: https://github.com/apache/hadoop/pull/4441#issuecomment-1157525096

   Thanks @zhengchenyu and @simbadzina .
   
   > I think config in client side may be more flexible.
   
   This is a very meaningful topic. If only the client controls whether or not 
to enable ObserverRead will be more difficult for Admin to control, because it 
is very difficult to upgrade the HDFS client in full. In other words: If RBF 
controls whether the ObserverRead is enabled, the Admin will be very convenient 
to control the ObserverRead of the entire cluster, and even dynamically control 
whether the ObserverRead of a single NS or the entire cluster is enabled. 
   But there may be some special Client that do not want to enable 
ObserverRead, so RBF should identify those requests and proxy them to the 
Active Namenode. 
   
   @simbadzina This is why dynamic updates are required, so that when Admin 
finds that there are some abnormal Observer NameNodes, he/she can quickly 
disable the ObserverRead of one NS or even all NSs.
   
   > In our draft design, after apply 
[HDFS-13522](https://issues.apache.org/jira/browse/HDFS-13522).002.patch, I 
wanna proxy client's state id.
   
   Proxying client's state id to the NameNode by RBF will be very complicated. 
   - A DFSClient may read or write some paths of different NameServices, and 
the stateID of different NS may be different.
   - The client does not know the Nameservice to which the reading or writing 
path belong, so it cannot pass the state id to RBF.
   




Issue Time Tracking
---

Worklog Id: (was: 782009)
Time Spent: 13h 10m  (was: 13h)

> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=781996=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781996
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 10:22
Start Date: 16/Jun/22 10:22
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4127:
URL: https://github.com/apache/hadoop/pull/4127#issuecomment-1157494963

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 49s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  1s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 12 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 19s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  28m 12s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  24m 51s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |  26m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   5m 30s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   6m 47s |  |  trunk passed  |
   | -1 :x: |  javadoc  |   1m 31s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/14/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   5m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  11m 51s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m  6s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 26s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m  5s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  24m  3s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |  24m  3s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  21m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  21m 30s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  0s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/14/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   4m 26s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/14/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 3 new + 339 unchanged - 1 fixed = 342 total (was 
340)  |
   | +1 :green_heart: |  mvnsite  |   6m 36s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 30s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4127/14/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in the patch failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   5m 44s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  12m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 51s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 10s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   2m 54s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 362m 44s |  |  hadoop-hdfs in the patch 
passed.  |
   | -1 :x: |  unit  |  34m  3s | 

[jira] [Work logged] (HDFS-16600) Deadlock on DataNode

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16600?focusedWorklogId=781994=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781994
 ]

ASF GitHub Bot logged work on HDFS-16600:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 10:11
Start Date: 16/Jun/22 10:11
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4367:
URL: https://github.com/apache/hadoop/pull/4367#issuecomment-1157483601

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  39m 57s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  40m 26s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   1m  8s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 32s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed  |
   | +1 :green_heart: |  spotbugs  |   3m 43s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  22m 52s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 54s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 25s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  spotbugs  |   3m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m  2s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 258m  9s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 52s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 400m 10s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/8/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4367 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c020b276eba7 4.15.0-169-generic #177-Ubuntu SMP Thu Feb 3 
10:50:38 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / f08e25d23aa96705511da6358769b81a4a711080 |
   | Default Java | Red Hat, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/8/testReport/ |
   | Max. process+thread count | 3757 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4367/8/console |
   | versions | git=2.9.5 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




Issue Time Tracking
---

Worklog Id: (was: 781994)
Time Spent: 4h 40m  (was: 4.5h)

> Deadlock on DataNode
> 
>
> Key: HDFS-16600
> URL: https://issues.apache.org/jira/browse/HDFS-16600
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 40m
>  Remaining Estimate: 0h
>
> The UT 
> org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.testSynchronousEviction 
> failed, because happened deadlock, which  is introduced by 
> [HDFS-16534|https://issues.apache.org/jira/browse/HDFS-16534]. 
> DeadLock:
> 

[jira] [Work logged] (HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16613?focusedWorklogId=781984=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781984
 ]

ASF GitHub Bot logged work on HDFS-16613:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 09:37
Start Date: 16/Jun/22 09:37
Worklog Time Spent: 10m 
  Work Description: lfxy commented on PR #4398:
URL: https://github.com/apache/hadoop/pull/4398#issuecomment-1157448224

   @hi-adachi . Excuse me, what is the next process?  Will this PR be merged 
into the trunk branch?




Issue Time Tracking
---

Worklog Id: (was: 781984)
Time Spent: 1h 50m  (was: 1h 40m)

> EC: Improve performance of decommissioning dn with many ec blocks
> -
>
> Key: HDFS-16613
> URL: https://issues.apache.org/jira/browse/HDFS-16613
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: ec, erasure-coding, namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2022-06-07-11-46-42-389.png, 
> image-2022-06-07-17-42-16-075.png, image-2022-06-07-17-45-45-316.png, 
> image-2022-06-07-17-51-04-876.png, image-2022-06-07-17-55-40-203.png, 
> image-2022-06-08-11-38-29-664.png, image-2022-06-08-11-41-11-127.png
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. 
> The reason is unlike replication blocks can be replicated from any dn which 
> has the same block replication, the ec block have to be replicated from the 
> decommissioning dn.
> The configurations dfs.namenode.replication.max-streams and 
> dfs.namenode.replication.max-streams-hard-limit will limit the replication 
> speed, but increase these configurations will create risk to the whole 
> cluster's network. So it should add a new configuration to limit the 
> decommissioning dn, distinguished from the cluster wide max-streams limit.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-13522?focusedWorklogId=781960=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781960
 ]

ASF GitHub Bot logged work on HDFS-13522:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 08:34
Start Date: 16/Jun/22 08:34
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#issuecomment-1157387445

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 42s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  14m 39s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  26m 42s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  26m 34s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |  23m 28s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   4m 58s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   7m 39s |  |  trunk passed  |
   | -1 :x: |  javadoc  |   1m 36s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/7/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   6m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  12m 53s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m  4s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 33s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   4m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  24m 50s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |  24m 50s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  22m 57s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |  22m 57s |  |  the patch passed  |
   | -1 :x: |  blanks  |   0m  1s | 
[/blanks-eol.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/7/artifact/out/blanks-eol.txt)
 |  The patch has 1 line(s) that end in blanks. Use git apply --whitespace=fix 
<>. Refer https://git-scm.com/docs/git-apply  |
   | -0 :warning: |  checkstyle  |   4m 18s | 
[/results-checkstyle-root.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/7/artifact/out/results-checkstyle-root.txt)
 |  root: The patch generated 3 new + 198 unchanged - 1 fixed = 201 total (was 
199)  |
   | +1 :green_heart: |  mvnsite  |   7m 20s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 41s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4311/7/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in the patch failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   6m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |  13m 28s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  24m 42s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  19m 25s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  unit  |   3m  9s |  |  hadoop-hdfs-client in the patch 
passed.  |
   | +1 :green_heart: |  unit  | 258m 53s |  |  hadoop-hdfs in the patch 
passed.  |
   | -1 :x: |  unit  |  23m 32s | 

[jira] [Work logged] (HDFS-16616) Remove the use if Sets#newHashSet and Sets#newTreeSet

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16616?focusedWorklogId=781949=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781949
 ]

ASF GitHub Bot logged work on HDFS-16616:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 07:09
Start Date: 16/Jun/22 07:09
Worklog Time Spent: 10m 
  Work Description: Samrat002 commented on PR #4400:
URL: https://github.com/apache/hadoop/pull/4400#issuecomment-1157313860

   - hadoop-hdfs in trunk failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1. failed for `trunk` and the `patch` 
   - All Test passed . 
   Please review the pr 
   Thanks 




Issue Time Tracking
---

Worklog Id: (was: 781949)
Time Spent: 50m  (was: 40m)

> Remove the use if Sets#newHashSet and Sets#newTreeSet 
> --
>
> Key: HDFS-16616
> URL: https://issues.apache.org/jira/browse/HDFS-16616
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: Samrat Deb
>Assignee: Samrat Deb
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> As part of removing guava dependencies  HADOOP-17115, HADOOP-17721, 
> HADOOP-17722 and HADOOP-17720 are fixed,
> Currently the code call util function to create HashSet and TreeSet in the 
> repo . These function calls dont have much importance as it is calling 
> internally new HashSet<> / new TreeSet<> from java.utils 
> This task is to clean up all the function calls to create sets which is 
> redundant 
> Before moving to java8 , sets were created using guava functions and API , 
> now since this is moved away and util code in the hadoop now looks like
> 1. 
> public static  TreeSet newTreeSet() {  return new 
> TreeSet(); 
> 2. 
> public static  HashSet newHashSet()
> { return new HashSet(); }
> These interfaces dont do anything much just a extra layer of function call 
> please refer to the task 
> https://issues.apache.org/jira/browse/HADOOP-17726
> Can anyone review if this ticket add some value in the code. 
> Looking forward to some input/ thoughts . If not adding any value we can 
> close it and not move forward with changes !



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Work logged] (HDFS-16616) Remove the use if Sets#newHashSet and Sets#newTreeSet

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16616?focusedWorklogId=781938=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781938
 ]

ASF GitHub Bot logged work on HDFS-16616:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 06:37
Start Date: 16/Jun/22 06:37
Worklog Time Spent: 10m 
  Work Description: hadoop-yetus commented on PR #4400:
URL: https://github.com/apache/hadoop/pull/4400#issuecomment-1157289843

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 22s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 10 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +0 :ok: |  mvndep  |  15m  8s |  |  Maven dependency ordering for branch  |
   | +1 :green_heart: |  mvninstall  |  29m 32s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   7m 29s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   8m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 42s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   3m  0s |  |  trunk passed  |
   | -1 :x: |  javadoc  |   2m  1s | 
[/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4400/3/artifact/out/branch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in trunk failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   2m 56s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   5m 57s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  24m  5s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +0 :ok: |  mvndep  |   0m 29s |  |  Maven dependency ordering for patch  |
   | +1 :green_heart: |  mvninstall  |   2m 13s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   7m 27s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   7m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   7m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   7m 28s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m 35s |  |  hadoop-hdfs-project: The 
patch generated 0 new + 385 unchanged - 1 fixed = 385 total (was 386)  |
   | +1 :green_heart: |  mvnsite  |   2m 47s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m 18s | 
[/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4400/3/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.txt)
 |  hadoop-hdfs in the patch failed with JDK Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1.  |
   | +1 :green_heart: |  javadoc  |   2m 42s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   5m 57s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 52s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 456m 39s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  unit  |  37m  7s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 18s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 660m 25s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4400/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4400 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   

[jira] [Work logged] (HDFS-16064) HDFS-721 causes DataNode decommissioning to get stuck indefinitely

2022-06-16 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16064?focusedWorklogId=781930=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-781930
 ]

ASF GitHub Bot logged work on HDFS-16064:
-

Author: ASF GitHub Bot
Created on: 16/Jun/22 06:25
Start Date: 16/Jun/22 06:25
Worklog Time Spent: 10m 
  Work Description: ZanderXu commented on PR #4410:
URL: https://github.com/apache/hadoop/pull/4410#issuecomment-1157281252

   @KevinWikant Nice catch +1. I learned a lot from it. thanks~




Issue Time Tracking
---

Worklog Id: (was: 781930)
Time Spent: 1h  (was: 50m)

> HDFS-721 causes DataNode decommissioning to get stuck indefinitely
> --
>
> Key: HDFS-16064
> URL: https://issues.apache.org/jira/browse/HDFS-16064
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, namenode
>Affects Versions: 3.2.1
>Reporter: Kevin Wikant
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Seems that https://issues.apache.org/jira/browse/HDFS-721 was resolved as a 
> non-issue under the assumption that if the namenode & a datanode get into an 
> inconsistent state for a given block pipeline, there should be another 
> datanode available to replicate the block to
> While testing datanode decommissioning using "dfs.exclude.hosts", I have 
> encountered a scenario where the decommissioning gets stuck indefinitely
> Below is the progression of events:
>  * there are initially 4 datanodes DN1, DN2, DN3, DN4
>  * scale-down is started by adding DN1 & DN2 to "dfs.exclude.hosts"
>  * HDFS block pipelines on DN1 & DN2 must now be replicated to DN3 & DN4 in 
> order to satisfy their minimum replication factor of 2
>  * during this replication process 
> https://issues.apache.org/jira/browse/HDFS-721 is encountered which causes 
> the following inconsistent state:
>  ** DN3 thinks it has the block pipeline in FINALIZED state
>  ** the namenode does not think DN3 has the block pipeline
> {code:java}
> 2021-06-06 10:38:23,604 INFO org.apache.hadoop.hdfs.server.datanode.DataNode 
> (DataXceiver for client  at /DN2:45654 [Receiving block BP-YYY:blk_XXX]): 
> DN3:9866:DataXceiver error processing WRITE_BLOCK operation  src: /DN2:45654 
> dst: /DN3:9866; 
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block 
> BP-YYY:blk_XXX already exists in state FINALIZED and thus cannot be created.
> {code}
>  * the replication is attempted again, but:
>  ** DN4 has the block
>  ** DN1 and/or DN2 have the block, but don't count towards the minimum 
> replication factor because they are being decommissioned
>  ** DN3 does not have the block & cannot have the block replicated to it 
> because of HDFS-721
>  * the namenode repeatedly tries to replicate the block to DN3 & repeatedly 
> fails, this continues indefinitely
>  * therefore DN4 is the only live datanode with the block & the minimum 
> replication factor of 2 cannot be satisfied
>  * because the minimum replication factor cannot be satisfied for the 
> block(s) being moved off DN1 & DN2, the datanode decommissioning can never be 
> completed 
> {code:java}
> 2021-06-06 10:39:10,106 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN1:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> ...
> 2021-06-06 10:57:10,105 INFO BlockStateChange (DatanodeAdminMonitor-0): 
> Block: blk_XXX, Expected Replicas: 2, live replicas: 1, corrupt replicas: 0, 
> decommissioned replicas: 0, decommissioning replicas: 2, maintenance 
> replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is 
> Open File: false, Datanodes having this block: DN1:9866 DN2:9866 DN4:9866 , 
> Current Datanode: DN2:9866, Is current datanode decommissioning: true, Is 
> current datanode entering maintenance: false
> {code}
> Being stuck in decommissioning state forever is not an intended behavior of 
> DataNode decommissioning
> A few potential solutions:
>  * Address the root cause of the problem which is an inconsistent state 
> between namenode & datanode: https://issues.apache.org/jira/browse/HDFS-721
>  * Detect when datanode decommissioning is stuck due to lack of available 
> datanodes for satisfying the minimum replication factor, then recover by 
> re-enabling the datanodes being decommissioned
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)