[jira] [Updated] (HDFS-17534) RBF: Support leader follower mode for multiple subclusters

2024-07-09 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17534:

Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
 Release Note: Adds new Mount Table mode: LEADER_FOLLOWER for RBF
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> RBF: Support leader follower mode for multiple subclusters
> --
>
> Key: HDFS-17534
> URL: https://issues.apache.org/jira/browse/HDFS-17534
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Currently there are five modes in multiple subclusters like
> HASH, LOCAL, RANDOM, HASH_ALL,SPACE;
> Proposal a new mode called leader/follower mode. routers try to write to 
> leader subcluster as many as possible. When routers read data, put leader 
> subcluster into first rank.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17534) RBF: Support leader follower mode for multiple subclusters

2024-07-09 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864064#comment-17864064
 ] 

Ayush Saxena commented on HDFS-17534:
-

Committed to trunk.
Thanx [~yuanbo] for the contribution!!!

> RBF: Support leader follower mode for multiple subclusters
> --
>
> Key: HDFS-17534
> URL: https://issues.apache.org/jira/browse/HDFS-17534
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Yuanbo Liu
>Assignee: Yuanbo Liu
>Priority: Major
>  Labels: pull-request-available
>
> Currently there are five modes in multiple subclusters like
> HASH, LOCAL, RANDOM, HASH_ALL,SPACE;
> Proposal a new mode called leader/follower mode. routers try to write to 
> leader subcluster as many as possible. When routers read data, put leader 
> subcluster into first rank.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17555) Fix NumberFormatException of NNThroughputBenchmark when configured dfs.blocksize.

2024-07-09 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17555:

Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> Fix NumberFormatException of NNThroughputBenchmark when configured 
> dfs.blocksize.
> -
>
> Key: HDFS-17555
> URL: https://issues.apache.org/jira/browse/HDFS-17555
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks, hdfs
>Affects Versions: 3.3.5, 3.3.3, 3.3.4, 3.3.6
>Reporter: wangzhongwei
>Assignee: wangzhongwei
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: image-2024-06-20-19-17-10-099.png
>
>
>      when using NNThroughputBenchmark, the configuration item dfs.blocksize 
> in hdfs-site.xml is configured with a letter as the suffix,such as 
> 256m,NumberFormatException occurred.
> command:
> hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
> hdfs://x -op create -threads 100 -files 1 -filesPerDir 100 -close
> !image-2024-06-20-19-17-10-099.png|width=631,height=202!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17555) Fix NumberFormatException of NNThroughputBenchmark when configured dfs.blocksize.

2024-07-09 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17864058#comment-17864058
 ] 

Ayush Saxena commented on HDFS-17555:
-

Committed to trunk.
Thanx [~zhongwei11] for the contribution & [~hexiaoqiao] for the review!!!

> Fix NumberFormatException of NNThroughputBenchmark when configured 
> dfs.blocksize.
> -
>
> Key: HDFS-17555
> URL: https://issues.apache.org/jira/browse/HDFS-17555
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: benchmarks, hdfs
>Affects Versions: 3.3.5, 3.3.3, 3.3.4, 3.3.6
>Reporter: wangzhongwei
>Assignee: wangzhongwei
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2024-06-20-19-17-10-099.png
>
>
>      when using NNThroughputBenchmark, the configuration item dfs.blocksize 
> in hdfs-site.xml is configured with a letter as the suffix,such as 
> 256m,NumberFormatException occurred.
> command:
> hadoop org.apache.hadoop.hdfs.server.namenode.NNThroughputBenchmark -fs 
> hdfs://x -op create -threads 100 -files 1 -filesPerDir 100 -close
> !image-2024-06-20-19-17-10-099.png|width=631,height=202!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17571) TestDatanodeManager#testGetBlockLocationConsiderStorageType is flaky

2024-07-08 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17571.
-
Resolution: Duplicate

> TestDatanodeManager#testGetBlockLocationConsiderStorageType is flaky
> 
>
> Key: HDFS-17571
> URL: https://issues.apache.org/jira/browse/HDFS-17571
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Priority: Major
>
> {noformat}
> org.junit.ComparisonFailure: expected: but was:
>   at org.junit.Assert.assertEquals(Assert.java:117)
>   at org.junit.Assert.assertEquals(Assert.java:146)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testGetBlockLocationConsiderStorageType(TestDatanodeManager.java:769)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}
> Ref: 
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6906/2/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestDatanodeManager/testGetBlockLocationConsiderStorageType/
> https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1628/testReport/junit/org.apache.hadoop.hdfs.server.blockmanagement/TestDatanodeManager/testGetBlockLocationConsiderStorageType/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17557) Fix bug for TestRedundancyMonitor#testChooseTargetWhenAllDataNodesStop

2024-07-06 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863451#comment-17863451
 ] 

Ayush Saxena commented on HDFS-17557:
-

Committed to trunk.
Thanx [~haiyang Hu] for the contribution!!!

> Fix bug for TestRedundancyMonitor#testChooseTargetWhenAllDataNodesStop
> --
>
> Key: HDFS-17557
> URL: https://issues.apache.org/jira/browse/HDFS-17557
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> Due to the modification in HDFS-16456, the current UT has not been able to 
> run successfully, so we need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17557) Fix bug for TestRedundancyMonitor#testChooseTargetWhenAllDataNodesStop

2024-07-06 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17557.
-
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix bug for TestRedundancyMonitor#testChooseTargetWhenAllDataNodesStop
> --
>
> Key: HDFS-17557
> URL: https://issues.apache.org/jira/browse/HDFS-17557
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Due to the modification in HDFS-16456, the current UT has not been able to 
> run successfully, so we need to fix it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17559) Fix the uuid as null in NameNodeMXBean

2024-07-06 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863449#comment-17863449
 ] 

Ayush Saxena commented on HDFS-17559:
-

Committed to trunk.
Thanx [~haiyang Hu] for the contribution!!!

> Fix the uuid as null in NameNodeMXBean
> --
>
> Key: HDFS-17559
> URL: https://issues.apache.org/jira/browse/HDFS-17559
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> If there is datanode info in includes, but the datanode service is not 
> currently started, the uuid of the datanode will be null. 
> when get the DeadNodes DeadNodes metric, the following exception will occur:
> {code:java}
> 2024-06-26 17:06:49,698 ERROR jmx.JMXJsonServlet 
> (JMXJsonServlet.java:writeAttribute(345)) [qtp1107412069-7704] - getting 
> attribute DeadNodes of Hadoop:service=NameNode,name=NameNodeInfo threw an 
> exception javax.management.RuntimeMBeanException: 
> java.lang.NullPointerException: null value in entry: uuid=null
>         at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrow(DefaultMBeanServerInterceptor.java:839)
>         at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.rethrowMaybeMBeanException(DefaultMBeanServerInterceptor.java:852)
>         at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getAttribute(DefaultMBeanServerInterceptor.java:651)
>         at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.getAttribute(JmxMBeanServer.java:678)
>         at 
> org.apache.hadoop.jmx.JMXJsonServlet.writeAttribute(JMXJsonServlet.java:338)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17571) TestDatanodeManager#testGetBlockLocationConsiderStorageType is flaky

2024-07-05 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863234#comment-17863234
 ] 

Ayush Saxena commented on HDFS-17571:
-

Suspecting HDFS-17098, to be the reason, If I revert that locally, the test 
passes for me

> TestDatanodeManager#testGetBlockLocationConsiderStorageType is flaky
> 
>
> Key: HDFS-17571
> URL: https://issues.apache.org/jira/browse/HDFS-17571
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Priority: Major
>
> {noformat}
> org.junit.ComparisonFailure: expected: but was:
>   at org.junit.Assert.assertEquals(Assert.java:117)
>   at org.junit.Assert.assertEquals(Assert.java:146)
>   at 
> org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testGetBlockLocationConsiderStorageType(TestDatanodeManager.java:769)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}
> Ref: 
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6906/2/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestDatanodeManager/testGetBlockLocationConsiderStorageType/
> https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1628/testReport/junit/org.apache.hadoop.hdfs.server.blockmanagement/TestDatanodeManager/testGetBlockLocationConsiderStorageType/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-17572) TestRouterSecurityManager#testDelegationTokens is flaky

2024-07-05 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863137#comment-17863137
 ] 

Ayush Saxena edited comment on HDFS-17572 at 7/5/24 6:49 AM:
-

haven't checked the test but most probably the token got expired before the 
assertion triggered, increasing the expiry time should fix most probably or 
something on similar lines


was (Author: ayushtkn):
most probably the token got expired before the assertion triggered, increasing 
the expiry time should fix most probably

> TestRouterSecurityManager#testDelegationTokens is flaky
> ---
>
> Key: HDFS-17572
> URL: https://issues.apache.org/jira/browse/HDFS-17572
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Priority: Major
>
> {noformat}
> Expected: (an instance of 
> org.apache.hadoop.security.token.SecretManager$InvalidToken and exception 
> with message a string containing "Renewal request for unknown token")
>  but: exception with message a string containing "Renewal request for 
> unknown token" message was "some_renewer tried to renew an expired token 
> (token for router: HDFS_DELEGATION_TOKEN owner=router, renewer=some_renewer, 
> realUser=, issueDate=1720114742074, maxDate=1720114742174, sequenceNumber=6, 
> masterKeyId=37) max expiration date: 2024-07-04 17:39:02,174+ 
> currentTime: 2024-07-04 17:39:02,233+"
> Stacktrace was: org.apache.hadoop.security.token.SecretManager$InvalidToken: 
> some_renewer tried to renew an expired token (token for router: 
> HDFS_DELEGATION_TOKEN owner=router, renewer=some_renewer, realUser=, 
> issueDate=1720114742074, maxDate=1720114742174, sequenceNumber=6, 
> masterKeyId=37) max expiration date: 2024-07-04 17:39:02,174+ 
> currentTime: 2024-07-04 17:39:02,233+
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:692)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.security.RouterSecurityManager.renewDelegationToken(RouterSecurityManager.java:180)
>  at 
> org.apache.hadoop.hdfs.server.federation.security.TestRouterSecurityManager.testDelegationTokens(TestRouterSecurityManager.java:140)
> {noformat}
> Ref:
> https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1628/testReport/junit/org.apache.hadoop.hdfs.server.federation.security/TestRouterSecurityManager/testDelegationTokens/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17572) TestRouterSecurityManager#testDelegationTokens is flaky

2024-07-05 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17863137#comment-17863137
 ] 

Ayush Saxena commented on HDFS-17572:
-

most probably the token got expired before the assertion triggered, increasing 
the expiry time should fix most probably

> TestRouterSecurityManager#testDelegationTokens is flaky
> ---
>
> Key: HDFS-17572
> URL: https://issues.apache.org/jira/browse/HDFS-17572
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Ayush Saxena
>Priority: Major
>
> {noformat}
> Expected: (an instance of 
> org.apache.hadoop.security.token.SecretManager$InvalidToken and exception 
> with message a string containing "Renewal request for unknown token")
>  but: exception with message a string containing "Renewal request for 
> unknown token" message was "some_renewer tried to renew an expired token 
> (token for router: HDFS_DELEGATION_TOKEN owner=router, renewer=some_renewer, 
> realUser=, issueDate=1720114742074, maxDate=1720114742174, sequenceNumber=6, 
> masterKeyId=37) max expiration date: 2024-07-04 17:39:02,174+ 
> currentTime: 2024-07-04 17:39:02,233+"
> Stacktrace was: org.apache.hadoop.security.token.SecretManager$InvalidToken: 
> some_renewer tried to renew an expired token (token for router: 
> HDFS_DELEGATION_TOKEN owner=router, renewer=some_renewer, realUser=, 
> issueDate=1720114742074, maxDate=1720114742174, sequenceNumber=6, 
> masterKeyId=37) max expiration date: 2024-07-04 17:39:02,174+ 
> currentTime: 2024-07-04 17:39:02,233+
>  at 
> org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:692)
>  at 
> org.apache.hadoop.hdfs.server.federation.router.security.RouterSecurityManager.renewDelegationToken(RouterSecurityManager.java:180)
>  at 
> org.apache.hadoop.hdfs.server.federation.security.TestRouterSecurityManager.testDelegationTokens(TestRouterSecurityManager.java:140)
> {noformat}
> Ref:
> https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1628/testReport/junit/org.apache.hadoop.hdfs.server.federation.security/TestRouterSecurityManager/testDelegationTokens/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17572) TestRouterSecurityManager#testDelegationTokens is flaky

2024-07-05 Thread Ayush Saxena (Jira)
Ayush Saxena created HDFS-17572:
---

 Summary: TestRouterSecurityManager#testDelegationTokens is flaky
 Key: HDFS-17572
 URL: https://issues.apache.org/jira/browse/HDFS-17572
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ayush Saxena



{noformat}
Expected: (an instance of 
org.apache.hadoop.security.token.SecretManager$InvalidToken and exception with 
message a string containing "Renewal request for unknown token")
 but: exception with message a string containing "Renewal request for 
unknown token" message was "some_renewer tried to renew an expired token (token 
for router: HDFS_DELEGATION_TOKEN owner=router, renewer=some_renewer, 
realUser=, issueDate=1720114742074, maxDate=1720114742174, sequenceNumber=6, 
masterKeyId=37) max expiration date: 2024-07-04 17:39:02,174+ currentTime: 
2024-07-04 17:39:02,233+"
Stacktrace was: org.apache.hadoop.security.token.SecretManager$InvalidToken: 
some_renewer tried to renew an expired token (token for router: 
HDFS_DELEGATION_TOKEN owner=router, renewer=some_renewer, realUser=, 
issueDate=1720114742074, maxDate=1720114742174, sequenceNumber=6, 
masterKeyId=37) max expiration date: 2024-07-04 17:39:02,174+ currentTime: 
2024-07-04 17:39:02,233+
 at 
org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:692)
 at 
org.apache.hadoop.hdfs.server.federation.router.security.RouterSecurityManager.renewDelegationToken(RouterSecurityManager.java:180)
 at 
org.apache.hadoop.hdfs.server.federation.security.TestRouterSecurityManager.testDelegationTokens(TestRouterSecurityManager.java:140)
{noformat}

Ref:
https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1628/testReport/junit/org.apache.hadoop.hdfs.server.federation.security/TestRouterSecurityManager/testDelegationTokens/



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17571) TestDatanodeManager#testGetBlockLocationConsiderStorageType is flaky

2024-07-05 Thread Ayush Saxena (Jira)
Ayush Saxena created HDFS-17571:
---

 Summary: 
TestDatanodeManager#testGetBlockLocationConsiderStorageType is flaky
 Key: HDFS-17571
 URL: https://issues.apache.org/jira/browse/HDFS-17571
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Ayush Saxena



{noformat}
org.junit.ComparisonFailure: expected: but was:
at org.junit.Assert.assertEquals(Assert.java:117)
at org.junit.Assert.assertEquals(Assert.java:146)
at 
org.apache.hadoop.hdfs.server.blockmanagement.TestDatanodeManager.testGetBlockLocationConsiderStorageType(TestDatanodeManager.java:769)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
{noformat}

Ref: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6906/2/testReport/org.apache.hadoop.hdfs.server.blockmanagement/TestDatanodeManager/testGetBlockLocationConsiderStorageType/

https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1628/testReport/junit/org.apache.hadoop.hdfs.server.blockmanagement/TestDatanodeManager/testGetBlockLocationConsiderStorageType/




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17551) Fix unit test failure caused by HDFS-17464

2024-06-12 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854489#comment-17854489
 ] 

Ayush Saxena commented on HDFS-17551:
-

Committed to trunk.
Thanx [~zhanghaobo] for the contribution!!!

> Fix unit test failure caused by HDFS-17464
> --
>
> Key: HDFS-17551
> URL: https://issues.apache.org/jira/browse/HDFS-17551
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>
> As title.
> This Jira is used to fix unit test failure caused by HDFS-17464.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17551) Fix unit test failure caused by HDFS-17464

2024-06-12 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17551.
-
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix unit test failure caused by HDFS-17464
> --
>
> Key: HDFS-17551
> URL: https://issues.apache.org/jira/browse/HDFS-17551
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> As title.
> This Jira is used to fix unit test failure caused by HDFS-17464.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17464) Improve some logs output in class FsDatasetImpl

2024-06-11 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17464?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17854246#comment-17854246
 ] 

Ayush Saxena commented on HDFS-17464:
-

[~zhanghaobo] this seems to be leading to a test failure
https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1604/testReport/junit/org.apache.hadoop.hdfs.server.datanode.fsdataset.impl/TestFsDatasetImpl/testMoveBlockFailure/

I think it is asserting the error message, can you shoot an Addendum PR to fix 
the test?


> Improve some logs output in class FsDatasetImpl
> ---
>
> Key: HDFS-17464
> URL: https://issues.apache.org/jira/browse/HDFS-17464
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode
>Affects Versions: 3.5.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17498) Distcp the concat files error, because sourceFS's chesksum is not equals to targetFS's chesksum.

2024-04-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840339#comment-17840339
 ] 

Ayush Saxena commented on HDFS-17498:
-

I don't catch exactly what you mean, but did you had preserve blocksize set or 
not?

> Distcp the concat files error, because sourceFS's chesksum is not equals to 
> targetFS's chesksum.
> 
>
> Key: HDFS-17498
> URL: https://issues.apache.org/jira/browse/HDFS-17498
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: distcp
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Priority: Major
> Attachments: image-2024-04-24-15-54-16-253.png, 
> image-2024-04-24-15-54-58-047.png, image-2024-04-24-15-55-25-519.png, 
> image-2024-04-24-15-55-48-752.png
>
>
> When we use distcp, the sourceFS's checksum and targetFS's checksum are 
> checked for consistency after the file transfer is complete. 
> However, for some files produced by ClientProcotol's concat(RPC method) on 
> the source side, the Block Size is less than 128MB(such as sourceFS file 
> =10MB+10MB, targetFS file = 20MB), so the checksum of the source and 
> destination side will be inconsistent, So It waill cause distcp failed
> !image-2024-04-24-15-54-16-253.png!
> !image-2024-04-24-15-54-58-047.png!
> !image-2024-04-24-15-55-25-519.png!
> !image-2024-04-24-15-55-48-752.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17477) IncrementalBlockReport race condition additional edge cases

2024-04-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840294#comment-17840294
 ] 

Ayush Saxena commented on HDFS-17477:
-

Hi [~dannytbecker] 

Seems like since this got committed 
TestLargeBlockReport#testBlockReportSucceedsWithLargerLengthLimit is failing 

ref:

[https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1564/testReport/junit/org.apache.hadoop.hdfs.server.datanode/TestLargeBlockReport/testBlockReportSucceedsWithLargerLengthLimit/]

 

It did fail once in the Jenkins result of this PR as well:

[https://github.com/apache/hadoop/pull/6748#issuecomment-2063042088]

 

But in the successive build, I am not sure if it ran or not. 

 

Tried locally, with this in locally it was failing with OOM, I reverted it & it 
passed.

Can you check once?

> IncrementalBlockReport race condition additional edge cases
> ---
>
> Key: HDFS-17477
> URL: https://issues.apache.org/jira/browse/HDFS-17477
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: auto-failover, ha, namenode
>Affects Versions: 3.3.5, 3.3.4, 3.3.6
>Reporter: Danny Becker
>Assignee: Danny Becker
>Priority: Major
>  Labels: pull-request-available
>
> HDFS-17453 fixes a race condition between IncrementalBlockReports (IBR) and 
> the Edit Log Tailer which can cause the Standby NameNode (SNN) to incorrectly 
> mark blocks as corrupt when it transitions to Active. There are a few edge 
> cases that HDFS-17453 does not cover.
> For Example:
> 1. SNN1 loads the edits for b1gs1 and b1gs2.
> 2. DN1 reports b1gs1 to SNN1, so it gets queued for later processing.
> 3. DN1 reports b1gs2 to SNN1 so it gets added to the blocks map.
> 4. SNN1 transitions to Active (ANN1).
> 5. ANN1 processes the pending DN message queue and marks DN1->b1gs1 as 
> corrupt because it was still in the queue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17475) Add a command to check if files are readable

2024-04-17 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17475:

Fix Version/s: (was: 3.5.0)

> Add a command to check if files are readable
> 
>
> Key: HDFS-17475
> URL: https://issues.apache.org/jira/browse/HDFS-17475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
>
> Sometimes a job can fail due to one unreadable file down the line due to 
> missing replicas or dead DNs or other reason. This command should allow users 
> to check whether files are readable by checking for metadata on DNs without 
> executing full read pipelines of the files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17475) Add a command to check if files are readable

2024-04-17 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838082#comment-17838082
 ] 

Ayush Saxena commented on HDFS-17475:
-

I am not sure this is something required, We have Fsck which does a lot of 
stuff, & as I see there is no production utility of this & is just for 
debugging purpose. So, I think Fsck or for debug you can just get the file 
only...

> Add a command to check if files are readable
> 
>
> Key: HDFS-17475
> URL: https://issues.apache.org/jira/browse/HDFS-17475
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Sometimes a job can fail due to one unreadable file down the line due to 
> missing replicas or dead DNs or other reason. This command should allow users 
> to check whether files are readable by checking for metadata on DNs without 
> executing full read pipelines of the files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17465) RBF: Use ProportionRouterRpcFairnessPolicyController get “java.Lang. Error: Maximum permit count exceeded”

2024-04-16 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17465:

Fix Version/s: 3.5.0

> RBF: Use ProportionRouterRpcFairnessPolicyController get  “java.Lang. Error: 
> Maximum permit count exceeded”
> ---
>
> Key: HDFS-17465
> URL: https://issues.apache.org/jira/browse/HDFS-17465
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: rbf
>Affects Versions: 3.5.0
>Reporter: Xiping Zhang
>Assignee: Xiping Zhang
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: image-2024-04-14-15-39-59-531.png, 
> image-2024-04-14-16-07-32-362.png, image-2024-04-14-16-23-18-499.png
>
>
> !image-2024-04-14-15-39-59-531.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17449) Fix ill-formed decommission host name and port pair triggers IndexOutOfBound error

2024-04-06 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17449.
-
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix ill-formed decommission host name and port pair triggers IndexOutOfBound 
> error 
> ---
>
> Key: HDFS-17449
> URL: https://issues.apache.org/jira/browse/HDFS-17449
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Assignee: ConfX
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> h2. What happened:
> Got IndexOutOfBound when trying to run 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatusAfterDNRestart
>  with namenode host provider set to 
> org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager.
> h2. Buggy code:
> In HostsFileWriter.java:
> {code:java}
> String[] hostAndPort = hostNameAndPort.split(":"); // hostNameAndPort might 
> be invalid
> dn.setHostName(hostAndPort[0]);
> dn.setPort(Integer.parseInt(hostAndPort[1])); // here IndexOutOfBound might 
> be thrown
> dn.setAdminState(AdminStates.DECOMMISSIONED);{code}
> h2. StackTrace:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>     at 
> org.apache.hadoop.hdfs.util.HostsFileWriter.initOutOfServiceHosts(HostsFileWriter.java:110){code}
> h2. How to reproduce:
> (1) Set {{dfs.namenode.hosts.provider.classname}} to 
> {{org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager}}
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatusAfterDNRestart}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17449) Fix ill-formed decommission host name and port pair triggers IndexOutOfBound error

2024-04-06 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17834501#comment-17834501
 ] 

Ayush Saxena commented on HDFS-17449:
-

Committed to trunk.
Thanx [~FuzzingTeam] for the contribution!!!

> Fix ill-formed decommission host name and port pair triggers IndexOutOfBound 
> error 
> ---
>
> Key: HDFS-17449
> URL: https://issues.apache.org/jira/browse/HDFS-17449
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Assignee: ConfX
>Priority: Major
>  Labels: pull-request-available
>
> h2. What happened:
> Got IndexOutOfBound when trying to run 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatusAfterDNRestart
>  with namenode host provider set to 
> org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager.
> h2. Buggy code:
> In HostsFileWriter.java:
> {code:java}
> String[] hostAndPort = hostNameAndPort.split(":"); // hostNameAndPort might 
> be invalid
> dn.setHostName(hostAndPort[0]);
> dn.setPort(Integer.parseInt(hostAndPort[1])); // here IndexOutOfBound might 
> be thrown
> dn.setAdminState(AdminStates.DECOMMISSIONED);{code}
> h2. StackTrace:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>     at 
> org.apache.hadoop.hdfs.util.HostsFileWriter.initOutOfServiceHosts(HostsFileWriter.java:110){code}
> h2. How to reproduce:
> (1) Set {{dfs.namenode.hosts.provider.classname}} to 
> {{org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager}}
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatusAfterDNRestart}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17449) Fix ill-formed decommission host name and port pair triggers IndexOutOfBound error

2024-04-06 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17449:

Summary: Fix ill-formed decommission host name and port pair triggers 
IndexOutOfBound error   (was: Ill-formed decommission host name and port pair 
would trigger IndexOutOfBound error)

> Fix ill-formed decommission host name and port pair triggers IndexOutOfBound 
> error 
> ---
>
> Key: HDFS-17449
> URL: https://issues.apache.org/jira/browse/HDFS-17449
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Assignee: ConfX
>Priority: Major
>  Labels: pull-request-available
>
> h2. What happened:
> Got IndexOutOfBound when trying to run 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatusAfterDNRestart
>  with namenode host provider set to 
> org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager.
> h2. Buggy code:
> In HostsFileWriter.java:
> {code:java}
> String[] hostAndPort = hostNameAndPort.split(":"); // hostNameAndPort might 
> be invalid
> dn.setHostName(hostAndPort[0]);
> dn.setPort(Integer.parseInt(hostAndPort[1])); // here IndexOutOfBound might 
> be thrown
> dn.setAdminState(AdminStates.DECOMMISSIONED);{code}
> h2. StackTrace:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>     at 
> org.apache.hadoop.hdfs.util.HostsFileWriter.initOutOfServiceHosts(HostsFileWriter.java:110){code}
> h2. How to reproduce:
> (1) Set {{dfs.namenode.hosts.provider.classname}} to 
> {{org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager}}
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatusAfterDNRestart}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17449) Ill-formed decommission host name and port pair would trigger IndexOutOfBound error

2024-04-06 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-17449:
---

Assignee: ConfX

> Ill-formed decommission host name and port pair would trigger IndexOutOfBound 
> error
> ---
>
> Key: HDFS-17449
> URL: https://issues.apache.org/jira/browse/HDFS-17449
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Assignee: ConfX
>Priority: Major
>  Labels: pull-request-available
>
> h2. What happened:
> Got IndexOutOfBound when trying to run 
> org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatusAfterDNRestart
>  with namenode host provider set to 
> org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager.
> h2. Buggy code:
> In HostsFileWriter.java:
> {code:java}
> String[] hostAndPort = hostNameAndPort.split(":"); // hostNameAndPort might 
> be invalid
> dn.setHostName(hostAndPort[0]);
> dn.setPort(Integer.parseInt(hostAndPort[1])); // here IndexOutOfBound might 
> be thrown
> dn.setAdminState(AdminStates.DECOMMISSIONED);{code}
> h2. StackTrace:
> {code:java}
> java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1
>     at 
> org.apache.hadoop.hdfs.util.HostsFileWriter.initOutOfServiceHosts(HostsFileWriter.java:110){code}
> h2. How to reproduce:
> (1) Set {{dfs.namenode.hosts.provider.classname}} to 
> {{org.apache.hadoop.hdfs.server.blockmanagement.CombinedHostFileManager}}
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestDecommissioningStatusWithBackoffMonitor#testDecommissionStatusAfterDNRestart}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17450) Add explicit dependency on httpclient jar

2024-03-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17450.
-
Fix Version/s: 3.4.1
   3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Add explicit dependency on httpclient jar
> -
>
> Key: HDFS-17450
> URL: https://issues.apache.org/jira/browse/HDFS-17450
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> Follow up to https://issues.apache.org/jira/browse/HADOOP-18890
> A previous [PR|https://github.com/apache/hadoop/pull/6057] for this issue 
> removed okhttp usage and used Apache HttpClient instead. The dependency on 
> HttpClient is indirect (a transitive dependency). I think it is better to 
> make the dependency explicit in hadoop-hdfs-client - the only project that 
> was significantly modified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17450) Add explicit dependency on httpclient jar

2024-03-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17450:

Summary: Add explicit dependency on httpclient jar  (was: add explicit 
dependency on httpclient jar)

> Add explicit dependency on httpclient jar
> -
>
> Key: HDFS-17450
> URL: https://issues.apache.org/jira/browse/HDFS-17450
> Project: Hadoop HDFS
>  Issue Type: Task
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
>
> Follow up to https://issues.apache.org/jira/browse/HADOOP-18890
> A previous [PR|https://github.com/apache/hadoop/pull/6057] for this issue 
> removed okhttp usage and used Apache HttpClient instead. The dependency on 
> HttpClient is indirect (a transitive dependency). I think it is better to 
> make the dependency explicit in hadoop-hdfs-client - the only project that 
> was significantly modified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17448) Enhance the stability of the unit test TestDiskBalancerCommand

2024-03-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17448.
-
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

>  Enhance the stability of the unit test TestDiskBalancerCommand 
> 
>
> Key: HDFS-17448
> URL: https://issues.apache.org/jira/browse/HDFS-17448
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> TestDiskBalancerCommand#testDiskBalancerQueryWithoutSubmitAndMultipleNodes  
> frequently fails tests, such as:
> https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1540/testReport/junit/org.apache.hadoop.hdfs.server.diskbalancer.command/TestDiskBalancerCommand/testDiskBalancerQueryWithoutSubmitAndMultipleNodes/
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6637/1/testReport/org.apache.hadoop.hdfs.server.diskbalancer.command/TestDiskBalancerCommand/testDiskBalancerQueryWithoutSubmitAndMultipleNodes/
> I will fix it  enhance the stability of the unit test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17448) Enhance the stability of the unit test TestDiskBalancerCommand

2024-03-30 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832477#comment-17832477
 ] 

Ayush Saxena commented on HDFS-17448:
-

Committed to trunk.

Thanx [~haiyang Hu] for the contribution!!!

>  Enhance the stability of the unit test TestDiskBalancerCommand 
> 
>
> Key: HDFS-17448
> URL: https://issues.apache.org/jira/browse/HDFS-17448
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> TestDiskBalancerCommand#testDiskBalancerQueryWithoutSubmitAndMultipleNodes  
> frequently fails tests, such as:
> https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1540/testReport/junit/org.apache.hadoop.hdfs.server.diskbalancer.command/TestDiskBalancerCommand/testDiskBalancerQueryWithoutSubmitAndMultipleNodes/
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6637/1/testReport/org.apache.hadoop.hdfs.server.diskbalancer.command/TestDiskBalancerCommand/testDiskBalancerQueryWithoutSubmitAndMultipleNodes/
> I will fix it  enhance the stability of the unit test.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17103) Fix file system cleanup in TestNameEditsConfigs

2024-03-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17103.
-
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix file system cleanup in TestNameEditsConfigs 
> 
>
> Key: HDFS-17103
> URL: https://issues.apache.org/jira/browse/HDFS-17103
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Assignee: ConfX
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: reproduce.sh
>
>
> h2. What happened:
> Got a {{NullPointerException}} without message when running 
> {{{}TestNameEditsConfigs{}}}.
> h2. Where's the bug:
> In line 450 of {{{}TestNameEditsConfigs{}}}, the test attempts to cleanup the 
> file system:
>  
> {noformat}
>       ...
>       fileSys = cluster.getFileSystem();
>       ...
>     } finally  {
>       fileSys.close();
>       cluster.shutdown();
>     }{noformat}
> However, the cleanup would result in a {{NullPointerException}} that covers 
> up the actual exception if the initialization of {{fileSys}} fails or another 
> exception is thrown before the line that initializes {{{}fileSys{}}}.
> h2. How to reproduce:
> (1) Set {{dfs.namenode.maintenance.replication.min}} to {{-1155969698}}
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs#testNameEditsConfigsFailure}}
> h2. Stacktrace:
> {noformat}
> java.lang.NullPointerException,
>         at 
> org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs.testNameEditsConfigsFailure(TestNameEditsConfigs.java:450),{noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17103) Fix file system cleanup in TestNameEditsConfigs

2024-03-30 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17832417#comment-17832417
 ] 

Ayush Saxena commented on HDFS-17103:
-

Committed to trunk.

Thanx [~FuzzingTeam] for the contribution!!!

> Fix file system cleanup in TestNameEditsConfigs 
> 
>
> Key: HDFS-17103
> URL: https://issues.apache.org/jira/browse/HDFS-17103
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Assignee: ConfX
>Priority: Critical
>  Labels: pull-request-available
> Attachments: reproduce.sh
>
>
> h2. What happened:
> Got a {{NullPointerException}} without message when running 
> {{{}TestNameEditsConfigs{}}}.
> h2. Where's the bug:
> In line 450 of {{{}TestNameEditsConfigs{}}}, the test attempts to cleanup the 
> file system:
>  
> {noformat}
>       ...
>       fileSys = cluster.getFileSystem();
>       ...
>     } finally  {
>       fileSys.close();
>       cluster.shutdown();
>     }{noformat}
> However, the cleanup would result in a {{NullPointerException}} that covers 
> up the actual exception if the initialization of {{fileSys}} fails or another 
> exception is thrown before the line that initializes {{{}fileSys{}}}.
> h2. How to reproduce:
> (1) Set {{dfs.namenode.maintenance.replication.min}} to {{-1155969698}}
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs#testNameEditsConfigsFailure}}
> h2. Stacktrace:
> {noformat}
> java.lang.NullPointerException,
>         at 
> org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs.testNameEditsConfigsFailure(TestNameEditsConfigs.java:450),{noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17103) Fix file system cleanup in TestNameEditsConfigs

2024-03-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17103:

Summary: Fix file system cleanup in TestNameEditsConfigs   (was: messy file 
system cleanup in TestNameEditsConfigs)

> Fix file system cleanup in TestNameEditsConfigs 
> 
>
> Key: HDFS-17103
> URL: https://issues.apache.org/jira/browse/HDFS-17103
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Assignee: ConfX
>Priority: Critical
>  Labels: pull-request-available
> Attachments: reproduce.sh
>
>
> h2. What happened:
> Got a {{NullPointerException}} without message when running 
> {{{}TestNameEditsConfigs{}}}.
> h2. Where's the bug:
> In line 450 of {{{}TestNameEditsConfigs{}}}, the test attempts to cleanup the 
> file system:
>  
> {noformat}
>       ...
>       fileSys = cluster.getFileSystem();
>       ...
>     } finally  {
>       fileSys.close();
>       cluster.shutdown();
>     }{noformat}
> However, the cleanup would result in a {{NullPointerException}} that covers 
> up the actual exception if the initialization of {{fileSys}} fails or another 
> exception is thrown before the line that initializes {{{}fileSys{}}}.
> h2. How to reproduce:
> (1) Set {{dfs.namenode.maintenance.replication.min}} to {{-1155969698}}
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs#testNameEditsConfigsFailure}}
> h2. Stacktrace:
> {noformat}
> java.lang.NullPointerException,
>         at 
> org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs.testNameEditsConfigsFailure(TestNameEditsConfigs.java:450),{noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17103) messy file system cleanup in TestNameEditsConfigs

2024-03-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-17103:
---

Assignee: ConfX

> messy file system cleanup in TestNameEditsConfigs
> -
>
> Key: HDFS-17103
> URL: https://issues.apache.org/jira/browse/HDFS-17103
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ConfX
>Assignee: ConfX
>Priority: Critical
>  Labels: pull-request-available
> Attachments: reproduce.sh
>
>
> h2. What happened:
> Got a {{NullPointerException}} without message when running 
> {{{}TestNameEditsConfigs{}}}.
> h2. Where's the bug:
> In line 450 of {{{}TestNameEditsConfigs{}}}, the test attempts to cleanup the 
> file system:
>  
> {noformat}
>       ...
>       fileSys = cluster.getFileSystem();
>       ...
>     } finally  {
>       fileSys.close();
>       cluster.shutdown();
>     }{noformat}
> However, the cleanup would result in a {{NullPointerException}} that covers 
> up the actual exception if the initialization of {{fileSys}} fails or another 
> exception is thrown before the line that initializes {{{}fileSys{}}}.
> h2. How to reproduce:
> (1) Set {{dfs.namenode.maintenance.replication.min}} to {{-1155969698}}
> (2) Run test: 
> {{org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs#testNameEditsConfigsFailure}}
> h2. Stacktrace:
> {noformat}
> java.lang.NullPointerException,
>         at 
> org.apache.hadoop.hdfs.server.namenode.TestNameEditsConfigs.testNameEditsConfigsFailure(TestNameEditsConfigs.java:450),{noformat}
> For an easy reproduction, run the reproduce.sh in the attachment.
> We are happy to provide a patch if this issue is confirmed.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17361) DiskBalancer: Query command support with multiple nodes

2024-03-27 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831614#comment-17831614
 ] 

Ayush Saxena commented on HDFS-17361:
-

Hi [~haiyang Hu] 

The introduced test seems to be flaky

https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1540/testReport/junit/org.apache.hadoop.hdfs.server.diskbalancer.command/TestDiskBalancerCommand/testDiskBalancerQueryWithoutSubmitAndMultipleNodes/

[https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6637/1/testReport/org.apache.hadoop.hdfs.server.diskbalancer.command/TestDiskBalancerCommand/testDiskBalancerQueryWithoutSubmitAndMultipleNodes/]

 

Can you give a check once

> DiskBalancer: Query command support with multiple nodes
> ---
>
> Key: HDFS-17361
> URL: https://issues.apache.org/jira/browse/HDFS-17361
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: datanode, diskbalancer
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> For: https://issues.apache.org/jira/browse/HDFS-10821 mentioned, Query 
> command will support with multiple nodes.
> That means we can use command hdfs diskbalancer -query to print one or one 
> more datanodes status of the diskbalancer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-03-21 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17829747#comment-17829747
 ] 

Ayush Saxena commented on HDFS-17370:
-

Thanx [~tasanuma] for the fix, I think there is one more problem

{noformat}
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test (default-test) on 
project hadoop-hdfs-rbf: Execution default-test of goal 
org.apache.maven.plugins:maven-surefire-plugin:3.0.0-M1:test failed: 
java.lang.NoClassDefFoundError: 
org/junit/platform/launcher/core/LauncherFactory: 
org.junit.platform.launcher.core.LauncherFactory -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :hadoop-hdfs-rbf
{noformat}

Refs:
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6637/1/artifact/out/patch-unit-root.txt
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6638/1/artifact/out/patch-unit-root.txt

It happened on one of my PR as well, something like this fixed:
https://github.com/apache/hadoop/pull/6629/files#diff-dbf6ea05af8f5d11e74cd87e059a361dd8b06d0f12f1d13ea9899fbbc4ffbc48R185-R189


> Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf
> ---
>
> Key: HDFS-17370
> URL: https://issues.apache.org/jira/browse/HDFS-17370
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.4.1, 3.5.0
>
>
> We need to add junit-jupiter-engine dependency for running parameterized 
> tests in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17370) Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf

2024-03-18 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17828028#comment-17828028
 ] 

Ayush Saxena commented on HDFS-17370:
-

Hi [~tasanuma]/[~simbadzina]
I think the router tests aren't running now: Only two tests are running, If you 
check this PR, the tests ran only for 9 mins
{noformat}
+1 :green_heart:unit9m 48s  hadoop-hdfs-rbf in the patch 
passed.
{noformat}
where in another comment here 
[https://github.com/apache/hadoop/pull/6510#issuecomment-1918979261], it took 
some 22m, that sounds still ok, considering parallel test profile being used.
{noformat}
+1 :green_heart: unit   22m 8s  hadoop-hdfs-rbf in the patch passed.
{noformat}
>From the daily build here:
[https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1519/testReport/org.apache.hadoop.hdfs.server.federation.router/]

It shows two tests only & I am pretty sure TestRouterRpc is one missing in the 
package. I think the two tests running are Junit5 & others are Junit4 stuff, 
enabling them screwed up the existing ones.

I haven't debugged much, just doubting this as this goes near.

cc. [~elgoiri] in case you have any pointers

> Fix junit dependency for running parameterized tests in hadoop-hdfs-rbf
> ---
>
> Key: HDFS-17370
> URL: https://issues.apache.org/jira/browse/HDFS-17370
> Project: Hadoop HDFS
>  Issue Type: Test
>Reporter: Takanobu Asanuma
>Assignee: Takanobu Asanuma
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.4.1, 3.5.0
>
>
> We need to add junit-jupiter-engine dependency for running parameterized 
> tests in hadoop-hdfs-rbf.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17336) Provide an option to enable/disable considering space used by .Trash folder for user quota compuation

2024-01-11 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805443#comment-17805443
 ] 

Ayush Saxena commented on HDFS-17336:
-

the */user* part in /user/user1/.Trash is configurable via config 
{{{}dfs.user.home.dir.prefix{}}}, maybe that can be used or we can even explore 
having a separate config for trash.home.prefix as well, ignoring quota doesn't 
seems very apt to me, it is indeed occupying space, why we should ignore that, 
I doubt people might use this as a hack to store more data than allowed.
 

> Provide an option to enable/disable considering space used by .Trash folder 
> for user quota compuation
> -
>
> Key: HDFS-17336
> URL: https://issues.apache.org/jira/browse/HDFS-17336
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Affects Versions: 3.1.4
>Reporter: Srinivasu Majeti
>Priority: Major
>
> We have a use case for a large account where /user/user1 has got space quota 
> configured. By default, Trash goes into /user/user1/.Trash. As long as 
> removed files stay back in Trash user will never be able to reclaim the space 
> quota. The customer is looking for a feature that will skip computing space 
> quota for the files in the Trash folder. Proposal is to introduce a new 
> configuration parameter to skip computing quota for Trash files.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-06 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17317.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-06 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17803858#comment-17803858
 ] 

Ayush Saxena commented on HDFS-17317:
-

Committed to trunk.
Thanx [~xuzifu] for the contribution & [~slfan1989] for the review!!!

> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17317) DebugAdmin metaOut not need multiple close

2024-01-06 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-17317:
---

Assignee: xy

> DebugAdmin metaOut not  need multiple close
> ---
>
> Key: HDFS-17317
> URL: https://issues.apache.org/jira/browse/HDFS-17317
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
>
> DebugAdmin metaOut not  need multiple close



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-01-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801881#comment-17801881
 ] 

Ayush Saxena commented on HDFS-17299:
-

Done!!!

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 

[jira] [Assigned] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2024-01-02 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-17299:
---

Assignee: Ritesh

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Assignee: Ritesh
>Priority: Critical
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,712 WARN  

[jira] [Commented] (HDFS-17215) RBF: Fix some method annotations about @throws

2023-12-25 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800356#comment-17800356
 ] 

Ayush Saxena commented on HDFS-17215:
-

Committed to trunk.
Thanx [~bigdata_zoodev] for the contribution!!!

> RBF: Fix some method annotations about @throws 
> ---
>
> Key: HDFS-17215
> URL: https://issues.apache.org/jira/browse/HDFS-17215
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Assignee: xiaojunxiang
>Priority: Minor
>  Labels: pull-request-available
>
> The setQuota method annotation of the Quota class has an error, which is 
> described in the @throws section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17215) RBF: Fix some method annotations about @throws

2023-12-25 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17215.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> RBF: Fix some method annotations about @throws 
> ---
>
> Key: HDFS-17215
> URL: https://issues.apache.org/jira/browse/HDFS-17215
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Assignee: xiaojunxiang
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> The setQuota method annotation of the Quota class has an error, which is 
> described in the @throws section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17215) RBF: Fix some method annotations about @throws

2023-12-25 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17215:

Summary: RBF: Fix some method annotations about @throws   (was: RBF: fix 
some method annotations about @throws )

> RBF: Fix some method annotations about @throws 
> ---
>
> Key: HDFS-17215
> URL: https://issues.apache.org/jira/browse/HDFS-17215
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Assignee: xiaojunxiang
>Priority: Minor
>  Labels: pull-request-available
>
> The setQuota method annotation of the Quota class has an error, which is 
> described in the @throws section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17215) The setQuota method annotation of the Quota class has an error

2023-12-25 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-17215:
---

Assignee: xiaojunxiang

> The setQuota method annotation of the Quota class has an error
> --
>
> Key: HDFS-17215
> URL: https://issues.apache.org/jira/browse/HDFS-17215
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Assignee: xiaojunxiang
>Priority: Minor
>  Labels: pull-request-available
>
> The setQuota method annotation of the Quota class has an error, which is 
> described in the @throws section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17215) RBF: fix some method annotations about @throws

2023-12-25 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17215:

Summary: RBF: fix some method annotations about @throws   (was: The 
setQuota method annotation of the Quota class has an error)

> RBF: fix some method annotations about @throws 
> ---
>
> Key: HDFS-17215
> URL: https://issues.apache.org/jira/browse/HDFS-17215
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: xiaojunxiang
>Assignee: xiaojunxiang
>Priority: Minor
>  Labels: pull-request-available
>
> The setQuota method annotation of the Quota class has an error, which is 
> described in the @throws section.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.

2023-12-25 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17056.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> EC: Fix verifyClusterSetup output in case of an invalid param.
> --
>
> Key: HDFS-17056
> URL: https://issues.apache.org/jira/browse/HDFS-17056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: Ayush Saxena
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.4.0
>
>
> {code:java}
> bin/hdfs ec  -verifyClusterSetup XOR-2-1-1024k        
> 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, 
> XOR-2-1-1024k. The number of DataNodes is only 3. {code}
> verifyClusterSetup requires -policy then the name of policies, else it 
> defaults to all enabled policies.
> In case there are additional invalid options it silently ignores them, unlike 
> other EC commands which throws out Too Many Argument exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param.

2023-12-25 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800339#comment-17800339
 ] 

Ayush Saxena commented on HDFS-17056:
-

Committed to trunk.
Thanx [~huangzhaobo99] for the contribution!!!

> EC: Fix verifyClusterSetup output in case of an invalid param.
> --
>
> Key: HDFS-17056
> URL: https://issues.apache.org/jira/browse/HDFS-17056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: Ayush Saxena
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: newbie, pull-request-available
>
> {code:java}
> bin/hdfs ec  -verifyClusterSetup XOR-2-1-1024k        
> 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, 
> XOR-2-1-1024k. The number of DataNodes is only 3. {code}
> verifyClusterSetup requires -policy then the name of policies, else it 
> defaults to all enabled policies.
> In case there are additional invalid options it silently ignores them, unlike 
> other EC commands which throws out Too Many Argument exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2023-12-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800248#comment-17800248
 ] 

Ayush Saxena commented on HDFS-17299:
-

Yeps, 

Excluding a rack in the streamer is quite tricky, we don't know the BPP neither 
the Cluster Rack configuration during the {{DataStreamer}} setup.

Maybe we should consider dropping the datanode from the pipeline, If possible, 
if we can't replace & reattempt with the remaining datanodes. Similarly as 
{{bestEffort}} in normal {{DatanodeReplacement}} case post the stream has been 
created.

[https://github.com/apache/hadoop/blob/rel/release-2.10.2/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/protocol/datatransfer/ReplaceDatanodeOnFailure.java#L114-L125]

Namenode... I don't think we have anything better than Stale node, which just 
brings the time duration down, rather than fixing.

rest I am also not very sure if there is any other clean way to handle this

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Priority: Critical
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on 

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2023-12-24 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800240#comment-17800240
 ] 

Ayush Saxena commented on HDFS-17299:
-

[~shahrs87] that config kicks in for post pipeline setup, not while creating 
one. So, I think your failure is during create itself.

[https://github.com/apache/hadoop/blob/branch-2.10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1571-L1573]

 

It won't reach here in your case since the pipeline wasn't setup, so nodes will 
be null here. 

[https://github.com/apache/hadoop/blob/branch-2.10/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1455]

 

Which I feel is a bug or atleast warrants for some improvements. :(

 

The end solution is like go ahead with 2 nodes in pipeline, how to reach there 
we can figure out, mostly it should be via the 
ReplaceDatanodeOnFailure, but we can figure out.
 
[~hexiaoqiao] The case is like for Default BPP, it would be like 2 racks & one 
rack down, but the Namenode didn't recognise the rack as down period
 
But here the mentioned case if for rack fault tolerant BPP, 3 racks, 
replication factor 3 & 1 rack down, but the NN doesn't recognise that as dead, 
so it always tries to allocate node from all 3 racks, though 1 rack is dead & 
the create never succeeds, I have added a patch with a repro test, can give a 
check (a very quick patch, maybe wrong).
 
interesting problem :) 

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Priority: Critical
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> 

[jira] [Updated] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2023-12-24 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17299:

Attachment: repro.patch

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Priority: Critical
> Attachments: repro.patch
>
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,712 WARN  [Thread-39087] hdfs.DataStreamer - 

[jira] [Commented] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param

2023-12-23 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800040#comment-17800040
 ] 

Ayush Saxena commented on HDFS-17056:
-

Go ahead, all yours!!

> EC: Fix verifyClusterSetup output in case of an invalid param
> -
>
> Key: HDFS-17056
> URL: https://issues.apache.org/jira/browse/HDFS-17056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: Ayush Saxena
>Priority: Major
>  Labels: newbie
>
> {code:java}
> bin/hdfs ec  -verifyClusterSetup XOR-2-1-1024k        
> 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, 
> XOR-2-1-1024k. The number of DataNodes is only 3. {code}
> verifyClusterSetup requires -policy then the name of policies, else it 
> defaults to all enabled policies.
> In case there are additional invalid options it silently ignores them, unlike 
> other EC commands which throws out Too Many Argument exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2023-12-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799897#comment-17799897
 ] 

Ayush Saxena edited comment on HDFS-17299 at 12/22/23 6:28 PM:
---

The stale node thing was added for a HBase use case only as part of HDFS-3703.

I feel the write shouldn't have failed if it was not able to replace a datanode 
after attempts, it should have went ahead with 2 nodes in the pipeline, if 2 
was greater than the minimum replication, if it doesn't already operate that 
way we should make sure it does, there are some policies in 
ReplaceDatanodeOnFailure.java maybe we can add one to not chase replacement if 
min replication is satisfied.

Not very sure about explicitly passing the entire rack as excluded post n 
retries, but logically doable.

Maybe if you would have put 
{{dfs.client.block.write.replace-datanode-on-failure.enable}} as {{{}false{}}}, 
it wouldn't have tried to replace the DN itself & went ahead with 2 DN from 
other AZ?

[~hexiaoqiao]/[~tasanuma] anyone with any ideas/opinions?

 

EDIT.
{quote}it just took little more than 1 second. I doubt keeping stale datanode 
interval to 9 seconds will help
{quote}
 

Pretty corner case, out of curiosity does HBase has any retry logics, like if 
the write fails once, attempt once more immediately or post some wait?


was (Author: ayushtkn):
The stale node thing was added for a HBase use case only as part of HDFS-3703.

I feel the write shouldn't have failed if it was not able to replace a datanode 
after attempts, it should have went ahead with 2 nodes in the pipeline, if 2 
was greater than the minimum replication, if it doesn't already operate that 
way we should make sure it does, there are some policies in 
ReplaceDatanodeOnFailure.java maybe we can add one to not chase replacement if 
min replication is satisfied.

Not very sure about explicitly passing the entire rack as excluded post n 
retries, but logically doable. 

Maybe if you would have put 
{{dfs.client.block.write.replace-datanode-on-failure.enable}} as {{false}}, it 
wouldn't have tried to replace the DN itself & went ahead with 2 DN from other 
AZ?

[~hexiaoqiao]/[~tasanuma] anyone with any ideas/opinions?

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Priority: Critical
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> 

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2023-12-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799897#comment-17799897
 ] 

Ayush Saxena commented on HDFS-17299:
-

The stale node thing was added for a HBase use case only as part of HDFS-3703.

I feel the write shouldn't have failed if it was not able to replace a datanode 
after attempts, it should have went ahead with 2 nodes in the pipeline, if 2 
was greater than the minimum replication, if it doesn't already operate that 
way we should make sure it does, there are some policies in 
ReplaceDatanodeOnFailure.java maybe we can add one to not chase replacement if 
min replication is satisfied.

Not very sure about explicitly passing the entire rack as excluded post n 
retries, but logically doable. 

Maybe if you would have put 
{{dfs.client.block.write.replace-datanode-on-failure.enable}} as {{false}}, it 
wouldn't have tried to replace the DN itself & went ahead with 2 DN from other 
AZ?

[~hexiaoqiao]/[~tasanuma] anyone with any ideas/opinions?

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Priority: Critical
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on 

[jira] [Comment Edited] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2023-12-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799702#comment-17799702
 ] 

Ayush Saxena edited comment on HDFS-17299 at 12/22/23 1:53 PM:
---

{quote}So it will take 123 ms (20.5mins) to detect that datanode is dead.
{quote}
Did you explore
{{dfs.namenode.avoid.write.stale.datanode}} and 
{{{}dfs.namenode.stale.datanode.interval{}}}. I believe that can bring down 
your value to few seconds, 30 by default but you can get that down to 
3*Heartbeat interval IIRC.
 

You should find the reason why the nodes were chosen in 1 AZ only, why no node 
was chosen in other 2 AZ, it will fallback to choosing nodes in 1 AZ(rack) only 
when it fails to spread them to different racks 


was (Author: ayushtkn):
{quote}So it will take 123 ms (20.5mins) to detect that datanode is dead.
{quote}
Did you explore
dfs.namenode.avoid.write.stale.datanode and 
{{{}dfs.namenode.stale.datanode.interval{}}}. I believe that can bring down 
your value to few seconds, 30 by default but you can get that down to 
3*Heartbeat interval IIRC.
 

You should find the reason why the nodes were chosen in 1 AZ only, why no node 
was chosen in other 2 AZ, it will fallback to choosing nodes in 1 AZ(rack) only 
when it fails to spread them to different racks 

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Priority: Critical
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] 

[jira] [Comment Edited] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2023-12-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799702#comment-17799702
 ] 

Ayush Saxena edited comment on HDFS-17299 at 12/22/23 1:52 PM:
---

{quote}So it will take 123 ms (20.5mins) to detect that datanode is dead.
{quote}
Did you explore
dfs.namenode.avoid.write.stale.datanode and 
{{{}dfs.namenode.stale.datanode.interval{}}}. I believe that can bring down 
your value to few seconds, 30 by default but you can get that down to 
3*Heartbeat interval IIRC.
 

You should find the reason why the nodes were chosen in 1 AZ only, why no node 
was chosen in other 2 AZ, it will fallback to choosing nodes in 1 AZ(rack) only 
when it fails to spread them to different racks 


was (Author: ayushtkn):
{quote}So it will take 123 ms (20.5mins) to detect that datanode is dead.
{quote}
Did you explore {{dfs.namenode.avoid.read.stale.datanode}} and 
{{{}dfs.namenode.stale.datanode.interval{}}}. I believe that can bring down 
your value to few seconds, 30 by default but you can get that down to 
3*Heartbeat interval IIRC.

 

You should find the reason why the nodes were chosen in 1 AZ only, why no node 
was chosen in other 2 AZ, it will fallback to choosing nodes in 1 AZ(rack) only 
when it fails to spread them to different racks 

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Priority: Critical
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] 

[jira] [Commented] (HDFS-17299) HDFS is not rack failure tolerant while creating a new file.

2023-12-22 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17799702#comment-17799702
 ] 

Ayush Saxena commented on HDFS-17299:
-

{quote}So it will take 123 ms (20.5mins) to detect that datanode is dead.
{quote}
Did you explore {{dfs.namenode.avoid.read.stale.datanode}} and 
{{{}dfs.namenode.stale.datanode.interval{}}}. I believe that can bring down 
your value to few seconds, 30 by default but you can get that down to 
3*Heartbeat interval IIRC.

 

You should find the reason why the nodes were chosen in 1 AZ only, why no node 
was chosen in other 2 AZ, it will fallback to choosing nodes in 1 AZ(rack) only 
when it fails to spread them to different racks 

> HDFS is not rack failure tolerant while creating a new file.
> 
>
> Key: HDFS-17299
> URL: https://issues.apache.org/jira/browse/HDFS-17299
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.10.1
>Reporter: Rushabh Shah
>Priority: Critical
>
> Recently we saw an HBase cluster outage when we mistakenly brought down 1 AZ.
> Our configuration:
> 1. We use 3 Availability Zones (AZs) for fault tolerance.
> 2. We use BlockPlacementPolicyRackFaultTolerant as the block placement policy.
> 3. We use the following configuration parameters: 
> dfs.namenode.heartbeat.recheck-interval: 60 
> dfs.heartbeat.interval: 3 
> So it will take 123 ms (20.5mins) to detect that datanode is dead.
>  
> Steps to reproduce:
>  # Bring down 1 AZ.
>  # HBase (HDFS client) tries to create a file (WAL file) and then calls 
> hflush on the newly created file.
>  # DataStreamer is not able to find blocks locations that satisfies the rack 
> placement policy (one copy in each rack which essentially means one copy in 
> each AZ)
>  # Since all the datanodes in that AZ are down but still alive to namenode, 
> the client gets different datanodes but still all of them are in the same AZ. 
> See logs below.
>  # HBase is not able to create a WAL file and it aborts the region server.
>  
> Relevant logs from hdfs client and namenode
>  
> {noformat}
> 2023-12-16 17:17:43,818 INFO  [on default port 9000] FSNamesystem.audit - 
> allowed=trueugi=hbase/ (auth:KERBEROS) ip=  
> cmd=create  src=/hbase/WALs/  dst=null
> 2023-12-16 17:17:43,978 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652565_140946716, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,061 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,061 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874--1594838129323:blk_1214652565_140946716
> 2023-12-16 17:17:44,179 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[:50010,DS-a493abdb-3ac3-49b1-9bfb-848baf5c1c2c,DISK]
> 2023-12-16 17:17:44,339 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652580_140946764, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,369 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack with 
> firstBadLink as :50010
> at 
> org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:113)
> at 
> org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1747)
> at 
> org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1651)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:715)
> 2023-12-16 17:17:44,369 WARN  [Thread-39087] hdfs.DataStreamer - Abandoning 
> BP-179318874-NN-IP-1594838129323:blk_1214652580_140946764
> 2023-12-16 17:17:44,454 WARN  [Thread-39087] hdfs.DataStreamer - Excluding 
> datanode 
> DatanodeInfoWithStorage[AZ-2-dn-2:50010,DS-46bb45cc-af89-46f3-9f9d-24e4fdc35b6d,DISK]
> 2023-12-16 17:17:44,522 INFO  [on default port 9000] hdfs.StateChange - 
> BLOCK* allocate blk_1214652594_140946796, replicas=:50010, 
> :50010, :50010 for /hbase/WALs/
> 2023-12-16 17:17:44,712 INFO  [Thread-39087] hdfs.DataStreamer - Exception in 
> createBlockOutputStream
> java.io.IOException: Got error, status=ERROR, status message , ack 

[jira] [Resolved] (HDFS-16904) Close webhdfs during the teardown

2023-12-21 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-16904.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Close webhdfs during the teardown
> -
>
> Key: HDFS-16904
> URL: https://issues.apache.org/jira/browse/HDFS-16904
> Project: Hadoop HDFS
>  Issue Type: Test
>  Components: hdfs
>Affects Versions: 3.4.0, 3.3.5, 3.3.9
> Environment: Tested using the Hadoop development environment Docker 
> image.
>Reporter: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0, 3.3.9
>
>
> The teardown for the tests shutdown the cluster, but leaves HDFS open.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17034) java.io.FileNotFoundException: File does not exist

2023-12-21 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17034.
-
Resolution: Cannot Reproduce

this is some cluster issue, reach out to user ML with all details, Jira is for 
reporting bug not for end user questions!!!

> java.io.FileNotFoundException: File does not exist
> --
>
> Key: HDFS-17034
> URL: https://issues.apache.org/jira/browse/HDFS-17034
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: dfs, dfsclient, hdfs
>Affects Versions: 2.9.2
>Reporter: Jepson
>Priority: Major
>
> *HBase2.2.2 Log:*
> 2023-06-02 08:07:57,423 INFO  [Close-WAL-Writer-177] util.FSHDFSUtils: 
> Recover lease on dfs file 
> /hbase/WALs/bdpprd07,16020,1685646099569/bdpprd07%2C16020%2C1685646099569.1685664417370
> 2023-06-02 08:07:57,425 INFO  [Close-WAL-Writer-177] util.FSHDFSUtils: Failed 
> to recover lease, attempt=0 on 
> file=/hbase/WALs/bdpprd07,16020,1685646099569/bdpprd07%2C16020%2C1685646099569.1685664417370
>  after 2ms
> 2023-06-02 08:08:01,427 WARN  [Close-WAL-Writer-177] wal.AsyncFSWAL: close 
> old writer failed
> java.io.FileNotFoundException: File does not exist: 
> /hbase/WALs/bdpprd07,16020,1685646099569/bdpprd07%2C16020%2C1685646099569.1685664417370
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:62)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLease(FSNamesystem.java:2358)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.recoverLease(NameNodeRpcServer.java:790)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.recoverLease(ClientNamenodeProtocolServerSideTranslatorPB.java:693)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:503)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:871)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:817)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2606)
>         at sun.reflect.GeneratedConstructorAccessor33.newInstance(Unknown 
> Source)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at 
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
>         at 
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
>         at org.apache.hadoop.hdfs.DFSClient.recoverLease(DFSClient.java:867)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:304)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:301)
>         at 
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
>         at 
> org.apache.hadoop.hdfs.DistributedFileSystem.recoverLease(DistributedFileSystem.java:301)
>         at 
> org.apache.hadoop.hbase.util.FSHDFSUtils.recoverLease(FSHDFSUtils.java:283)
>         at 
> org.apache.hadoop.hbase.util.FSHDFSUtils.recoverDFSFileLease(FSHDFSUtils.java:216)
>         at 
> org.apache.hadoop.hbase.util.FSHDFSUtils.recoverFileLease(FSHDFSUtils.java:163)
>         at 
> org.apache.hadoop.hbase.io.asyncfs.FanOutOneBlockAsyncDFSOutput.recoverAndClose(FanOutOneBlockAsyncDFSOutput.java:559)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncProtobufLogWriter.close(AsyncProtobufLogWriter.java:157)
>         at 
> org.apache.hadoop.hbase.regionserver.wal.AsyncFSWAL.lambda$closeWriter$6(AsyncFSWAL.java:646)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: 
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File 
> does not exist: 
> /hbase/WALs/bdpprd07,16020,1685646099569/bdpprd07%2C16020%2C1685646099569.1685664417370
>         at 
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:72)
>         at 
> 

[jira] [Updated] (HDFS-17056) EC: Fix verifyClusterSetup output in case of an invalid param

2023-12-21 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17056:

Labels: newbie  (was: )

> EC: Fix verifyClusterSetup output in case of an invalid param
> -
>
> Key: HDFS-17056
> URL: https://issues.apache.org/jira/browse/HDFS-17056
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: ec
>Reporter: Ayush Saxena
>Priority: Major
>  Labels: newbie
>
> {code:java}
> bin/hdfs ec  -verifyClusterSetup XOR-2-1-1024k        
> 9 DataNodes are required for the erasure coding policies: RS-6-3-1024k, 
> XOR-2-1-1024k. The number of DataNodes is only 3. {code}
> verifyClusterSetup requires -policy then the name of policies, else it 
> defaults to all enabled policies.
> In case there are additional invalid options it silently ignores them, unlike 
> other EC commands which throws out Too Many Argument exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17238) Setting the value of "dfs.blocksize" too large will cause HDFS to be unable to write to files

2023-12-21 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17238.
-
Resolution: Won't Fix

this is a misconfiguration, can't help it nor we can handle all such issues

> Setting the value of "dfs.blocksize" too large will cause HDFS to be unable 
> to write to files
> -
>
> Key: HDFS-17238
> URL: https://issues.apache.org/jira/browse/HDFS-17238
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.3.6
>Reporter: ECFuzz
>Priority: Major
>
> My hadoop version is 3.3.6, and I use the Pseudo-Distributed Operation.
> core-site.xml like below.
> {code:java}
> 
>   
>         fs.defaultFS
>         hdfs://localhost:9000
>     
>     
>         hadoop.tmp.dir
>         /home/hadoop/Mutil_Component/tmp
>     
>    
> {code}
> hdfs-site.xml like below.
> {code:java}
> 
>    
>         dfs.replication
>         1
>     
> 
>         dfs.blocksize
>         134217728
>     
>    
> {code}
> And then format the namenode, and start the hdfs. HDFS is running normally.
> {code:java}
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> bin/hdfs namenode -format
> x(many info)
> hadoop@hadoop-Standard-PC-i440FX-PIIX-1996:~/Mutil_Component/hadoop-3.3.6$ 
> sbin/start-dfs.sh
> Starting namenodes on [localhost]
> Starting datanodes
> Starting secondary namenodes [hadoop-Standard-PC-i440FX-PIIX-1996] {code}
> Finally, use dfs to place a file. 
> {code:java}
> bin/hdfs dfs -mkdir -p /user/hadoop
> bin/hdfs dfs -mkdir input
> bin/hdfs dfs -put etc/hadoop/*.xml input {code}
> Discovering Exception Throwing.
> {code:java}
> 2023-10-19 14:56:34,603 WARN hdfs.DataStreamer: DataStreamer Exception
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
> /user/hadoop/input/capacity-scheduler.xml._COPYING_ could only be written to 
> 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 
> node(s) are excluded in this operation.
>         at 
> org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
>         at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
>         at 
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
>         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at javax.security.auth.Subject.doAs(Subject.java:422)
>         at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)        
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1513)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1410)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
>         at com.sun.proxy.$Proxy9.addBlock(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
>         at 
> 

[jira] [Resolved] (HDFS-17240) Fix a typo in DataStorage.java

2023-12-21 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17240.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix a typo in DataStorage.java
> --
>
> Key: HDFS-17240
> URL: https://issues.apache.org/jira/browse/HDFS-17240
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Yu Wang
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Fix a typo in DataStorage.java
>  
> {code:java}
>    /**
> -   * Analize which and whether a transition of the fs state is required
> +   * Analyze which and whether a transition of the fs state is required
>     * and perform it if necessary.
>     * {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17282) Reconfig 'SlowIoWarningThreshold' parameters for datanode.

2023-12-13 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17796569#comment-17796569
 ] 

Ayush Saxena commented on HDFS-17282:
-

Committed to trunk.

Thanx [~huangzhaobo99] for the contribution!!!

> Reconfig 'SlowIoWarningThreshold' parameters for datanode.
> --
>
> Key: HDFS-17282
> URL: https://issues.apache.org/jira/browse/HDFS-17282
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17282) Reconfig 'SlowIoWarningThreshold' parameters for datanode.

2023-12-13 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17282.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Reconfig 'SlowIoWarningThreshold' parameters for datanode.
> --
>
> Key: HDFS-17282
> URL: https://issues.apache.org/jira/browse/HDFS-17282
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17278) Detect order dependent flakiness in TestViewfsWithNfs3.java under hadoop-hdfs-nfs module

2023-12-11 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17278.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Detect order dependent flakiness in TestViewfsWithNfs3.java under 
> hadoop-hdfs-nfs module
> 
>
> Key: HDFS-17278
> URL: https://issues.apache.org/jira/browse/HDFS-17278
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: openjdk version "17.0.9"
> Apache Maven 3.9.5
>Reporter: Ruby
>Assignee: Ruby
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: failed-1.png, failed-2.png, success.png
>
>
> The order dependent flakiness was detected if the test class 
> TestDFSClientCache.java runs before TestRpcProgramNfs3.java.
> The error message looks like below:
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestRpcProgramNfs3.testAccess:279 Incorrect return code 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testCommit:764 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testCreate:493 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   
> TestRpcProgramNfs3.testEncryptedReadWrite:359->createFileUsingNfs:393 
> Incorrect response:  expected: but 
> was:
> [ERROR]   TestRpcProgramNfs3.testFsinfo:714 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testFsstat:696 Incorrect return code: 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testGetattr:205 Incorrect return code 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testLookup:249 Incorrect return code 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testMkdir:517 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testPathconf:738 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRead:341 Incorrect return code: expected:<0> 
> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testReaddir:642 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testReaddirplus:666 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testReadlink:297 Incorrect return code: 
> expected:<0> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRemove:570 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRename:618 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRmdir:594 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testSetattr:225 Incorrect return code 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testSymlink:546 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testWrite:468 Incorrect return code: 
> expected:<13> but was:<5>
> [INFO] 
> [ERROR] Tests run: 25, Failures: 20, Errors: 0, Skipped: 0
> [INFO] 
> [ERROR] There are test failures. {code}
> The polluter that led to this flakiness was the test method
> testGetUserGroupInformationSecure() in TestDFSClientCache.java. There was a 
> line 
> {code:java}
> UserGroupInformation.setLoginUser(currentUserUgi);{code}
> which modifies some shared state and resource, something like pre-setup the 
> config. To fix this issue, I added the cleanup methods in 
> TestDFSClientCache.java to reset the UserGroupInformation to ensure the 
> isolation among each test class.
> {code:java}
> @AfterClass
> public static void cleanup() {
> UserGroupInformation.reset();
> }{code}
> Including setting
> {code:java}
> authenticationMethod = null;
> conf = null; // set configuration to null
> setLoginUser(null); // reset login user to default null{code}
> ..., and so on. The reset() methods can be referred to 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java.
> After the fix, the error was no longer exist and the succeed message was:
> {code:java}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 18.457 s - in org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
> [INFO] 
> [INFO] Results:
> [INFO] 
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
>  
> {code}
> Here is the CustomTest.java file that I used to run these two tests in order, 
> the 

[jira] [Commented] (HDFS-17278) Detect order dependent flakiness in TestViewfsWithNfs3.java under hadoop-hdfs-nfs module

2023-12-11 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795620#comment-17795620
 ] 

Ayush Saxena commented on HDFS-17278:
-

Committed to trunk.

Thanx [~yijujt2] for the contribution!!!

> Detect order dependent flakiness in TestViewfsWithNfs3.java under 
> hadoop-hdfs-nfs module
> 
>
> Key: HDFS-17278
> URL: https://issues.apache.org/jira/browse/HDFS-17278
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: openjdk version "17.0.9"
> Apache Maven 3.9.5
>Reporter: Ruby
>Assignee: Ruby
>Priority: Minor
>  Labels: pull-request-available
> Attachments: failed-1.png, failed-2.png, success.png
>
>
> The order dependent flakiness was detected if the test class 
> TestDFSClientCache.java runs before TestRpcProgramNfs3.java.
> The error message looks like below:
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestRpcProgramNfs3.testAccess:279 Incorrect return code 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testCommit:764 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testCreate:493 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   
> TestRpcProgramNfs3.testEncryptedReadWrite:359->createFileUsingNfs:393 
> Incorrect response:  expected: but 
> was:
> [ERROR]   TestRpcProgramNfs3.testFsinfo:714 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testFsstat:696 Incorrect return code: 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testGetattr:205 Incorrect return code 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testLookup:249 Incorrect return code 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testMkdir:517 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testPathconf:738 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRead:341 Incorrect return code: expected:<0> 
> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testReaddir:642 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testReaddirplus:666 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testReadlink:297 Incorrect return code: 
> expected:<0> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRemove:570 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRename:618 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRmdir:594 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testSetattr:225 Incorrect return code 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testSymlink:546 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testWrite:468 Incorrect return code: 
> expected:<13> but was:<5>
> [INFO] 
> [ERROR] Tests run: 25, Failures: 20, Errors: 0, Skipped: 0
> [INFO] 
> [ERROR] There are test failures. {code}
> The polluter that led to this flakiness was the test method
> testGetUserGroupInformationSecure() in TestDFSClientCache.java. There was a 
> line 
> {code:java}
> UserGroupInformation.setLoginUser(currentUserUgi);{code}
> which modifies some shared state and resource, something like pre-setup the 
> config. To fix this issue, I added the cleanup methods in 
> TestDFSClientCache.java to reset the UserGroupInformation to ensure the 
> isolation among each test class.
> {code:java}
> @AfterClass
> public static void cleanup() {
> UserGroupInformation.reset();
> }{code}
> Including setting
> {code:java}
> authenticationMethod = null;
> conf = null; // set configuration to null
> setLoginUser(null); // reset login user to default null{code}
> ..., and so on. The reset() methods can be referred to 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java.
> After the fix, the error was no longer exist and the succeed message was:
> {code:java}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 18.457 s - in org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
> [INFO] 
> [INFO] Results:
> [INFO] 
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
>  
> {code}
> Here is the CustomTest.java file that I used to run these two tests in order, 
> the error can 

[jira] [Assigned] (HDFS-17278) Detect order dependent flakiness in TestViewfsWithNfs3.java under hadoop-hdfs-nfs module

2023-12-11 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-17278:
---

Assignee: Ruby

> Detect order dependent flakiness in TestViewfsWithNfs3.java under 
> hadoop-hdfs-nfs module
> 
>
> Key: HDFS-17278
> URL: https://issues.apache.org/jira/browse/HDFS-17278
> Project: Hadoop HDFS
>  Issue Type: Bug
> Environment: openjdk version "17.0.9"
> Apache Maven 3.9.5
>Reporter: Ruby
>Assignee: Ruby
>Priority: Minor
>  Labels: pull-request-available
> Attachments: failed-1.png, failed-2.png, success.png
>
>
> The order dependent flakiness was detected if the test class 
> TestDFSClientCache.java runs before TestRpcProgramNfs3.java.
> The error message looks like below:
> {code:java}
> [ERROR] Failures: 
> [ERROR]   TestRpcProgramNfs3.testAccess:279 Incorrect return code 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testCommit:764 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testCreate:493 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   
> TestRpcProgramNfs3.testEncryptedReadWrite:359->createFileUsingNfs:393 
> Incorrect response:  expected: but 
> was:
> [ERROR]   TestRpcProgramNfs3.testFsinfo:714 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testFsstat:696 Incorrect return code: 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testGetattr:205 Incorrect return code 
> expected:<0> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testLookup:249 Incorrect return code 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testMkdir:517 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testPathconf:738 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRead:341 Incorrect return code: expected:<0> 
> but was:<13>
> [ERROR]   TestRpcProgramNfs3.testReaddir:642 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testReaddirplus:666 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testReadlink:297 Incorrect return code: 
> expected:<0> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRemove:570 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRename:618 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testRmdir:594 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testSetattr:225 Incorrect return code 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testSymlink:546 Incorrect return code: 
> expected:<13> but was:<5>
> [ERROR]   TestRpcProgramNfs3.testWrite:468 Incorrect return code: 
> expected:<13> but was:<5>
> [INFO] 
> [ERROR] Tests run: 25, Failures: 20, Errors: 0, Skipped: 0
> [INFO] 
> [ERROR] There are test failures. {code}
> The polluter that led to this flakiness was the test method
> testGetUserGroupInformationSecure() in TestDFSClientCache.java. There was a 
> line 
> {code:java}
> UserGroupInformation.setLoginUser(currentUserUgi);{code}
> which modifies some shared state and resource, something like pre-setup the 
> config. To fix this issue, I added the cleanup methods in 
> TestDFSClientCache.java to reset the UserGroupInformation to ensure the 
> isolation among each test class.
> {code:java}
> @AfterClass
> public static void cleanup() {
> UserGroupInformation.reset();
> }{code}
> Including setting
> {code:java}
> authenticationMethod = null;
> conf = null; // set configuration to null
> setLoginUser(null); // reset login user to default null{code}
> ..., and so on. The reset() methods can be referred to 
> hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/UserGroupInformation.java.
> After the fix, the error was no longer exist and the succeed message was:
> {code:java}
> [INFO] ---
> [INFO]  T E S T S
> [INFO] ---
> [INFO] Running org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 18.457 s - in org.apache.hadoop.hdfs.nfs.nfs3.CustomTest
> [INFO] 
> [INFO] Results:
> [INFO] 
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0
> [INFO] 
> [INFO] 
> 
> [INFO] BUILD SUCCESS
> [INFO] 
>  
> {code}
> Here is the CustomTest.java file that I used to run these two tests in order, 
> the error can be reproduce by running this CustomTest.java. 
> {code:java}
> package 

[jira] [Commented] (HDFS-17272) NNThroughputBenchmark should support specifying the base directory for multi-client test

2023-12-10 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795037#comment-17795037
 ] 

Ayush Saxena commented on HDFS-17272:
-

Committed to trunk.

Thanx [~caozhiqiang] for the contribution & [~tomscut] for the review!!!

> NNThroughputBenchmark should support specifying the base directory for 
> multi-client test
> 
>
> Key: HDFS-17272
> URL: https://issues.apache.org/jira/browse/HDFS-17272
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
>
> Currently, NNThroughputBenchmark does not support specifying the base 
> directory, therefore does not support multiple clients performing stress 
> testing at the same time. However, for high-performance namenode machine, 
> only one client submitting stress test can not make the namenode rpc access 
> reach the bottleneck. Therefore, multiple clients are required for parallel 
> testing to make the namenode pressure reach the level of the large-scale 
> production cluster.
> So I specify the base directory through the -baseDirName parameter to support 
> multiple clients submitting stress tests at the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17272) NNThroughputBenchmark should support specifying the base directory for multi-client test

2023-12-10 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17272:

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

> NNThroughputBenchmark should support specifying the base directory for 
> multi-client test
> 
>
> Key: HDFS-17272
> URL: https://issues.apache.org/jira/browse/HDFS-17272
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: namenode
>Affects Versions: 3.4.0
>Reporter: caozhiqiang
>Assignee: caozhiqiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Currently, NNThroughputBenchmark does not support specifying the base 
> directory, therefore does not support multiple clients performing stress 
> testing at the same time. However, for high-performance namenode machine, 
> only one client submitting stress test can not make the namenode rpc access 
> reach the bottleneck. Therefore, multiple clients are required for parallel 
> testing to make the namenode pressure reach the level of the large-scale 
> production cluster.
> So I specify the base directory through the -baseDirName parameter to support 
> multiple clients submitting stress tests at the same time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17279) RBF: Fix link to Fedbalance document

2023-12-08 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17279.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> RBF: Fix link to Fedbalance document 
> -
>
> Key: HDFS-17279
> URL: https://issues.apache.org/jira/browse/HDFS-17279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 
> Fix link to Fedbalance document cannot be displayed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17279) RBF: Fix link to Fedbalance document

2023-12-08 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794857#comment-17794857
 ] 

Ayush Saxena commented on HDFS-17279:
-

Committed to trunk.

Thanx [~haiyang Hu] for the contribution & [~elgoiri] for the review!!!

> RBF: Fix link to Fedbalance document 
> -
>
> Key: HDFS-17279
> URL: https://issues.apache.org/jira/browse/HDFS-17279
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Attachments: screenshot-1.png
>
>
>  !screenshot-1.png! 
> Fix link to Fedbalance document cannot be displayed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17226) Building native libraries fails on Fedora 38

2023-12-03 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17226:

Target Version/s: 3.4.0, 3.3.9, 3.3.7

> Building native libraries fails on Fedora 38
> 
>
> Key: HDFS-17226
> URL: https://issues.apache.org/jira/browse/HDFS-17226
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: libhdfs++, native
>Reporter: Kengo Seki
>Priority: Major
>  Labels: pull-request-available
>
> I tried to build native libraries on Fedora 38, in which gcc-c++ 13.2.1 is 
> installed by default, and I came across the following error.
> {code}
> $ cat /etc/redhat-release 
> Fedora release 38 (Thirty Eight)
> $ g++ --version
> g++ (GCC) 13.2.1 20230728 (Red Hat 13.2.1-1)
> Copyright (C) 2023 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> $ mvn clean package -DskipTests -Pnative
> ...
> [WARNING] 
> /home/vagrant/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/include/hdfspp/uri.h:60:3:
>  error: ‘uint16_t’ does not name a type
> [WARNING]60 |   uint16_t get_port() const;
> [WARNING]   |   ^~~~
> [WARNING] 
> /home/vagrant/hadoop/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfspp/include/hdfspp/uri.h:25:1:
>  note: ‘uint16_t’ is defined in header ‘’; did you forget to 
> ‘#include ’?
> [WARNING]24 | #include 
> [WARNING]   +++ |+#include 
> [WARNING]25 | #include 
> ...
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time:  07:00 min
> [INFO] Finished at: 2023-10-14T07:14:57Z
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.hadoop:hadoop-maven-plugins:3.4.0-SNAPSHOT:cmake-compile 
> (cmake-compile) on project hadoop-hdfs-native-client: make failed with error 
> code 2 -> [Help 1]
> {code}
> As described in the warning messages, adding {{#include }} will 
> solve the problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-17260) Fix the logic for reconfigure slow peer enable for Namenode.

2023-12-03 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792501#comment-17792501
 ] 

Ayush Saxena edited comment on HDFS-17260 at 12/3/23 9:41 AM:
--

Committed to trunk.

Thanx [~huangzhaobo99] for the contribution & [~haiyang Hu] for the review!!!


was (Author: ayushtkn):
Committed to trunk.

Thanx [~huangzhaobo99] for the contribution!!!

> Fix the logic for reconfigure slow peer enable for Namenode.
> 
>
> Key: HDFS-17260
> URL: https://issues.apache.org/jira/browse/HDFS-17260
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17260) Fix the logic for reconfigure slow peer enable for Namenode.

2023-12-03 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792501#comment-17792501
 ] 

Ayush Saxena commented on HDFS-17260:
-

Committed to trunk.

Thanx [~huangzhaobo99] for the contribution!!!

> Fix the logic for reconfigure slow peer enable for Namenode.
> 
>
> Key: HDFS-17260
> URL: https://issues.apache.org/jira/browse/HDFS-17260
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17260) Fix the logic for reconfigure slow peer enable for Namenode.

2023-12-03 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17260.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix the logic for reconfigure slow peer enable for Namenode.
> 
>
> Key: HDFS-17260
> URL: https://issues.apache.org/jira/browse/HDFS-17260
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17233) The conf dfs.datanode.lifeline.interval.seconds is not considering time unit seconds

2023-12-02 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17233.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> The conf dfs.datanode.lifeline.interval.seconds is not considering time unit 
> seconds
> 
>
> Key: HDFS-17233
> URL: https://issues.apache.org/jira/browse/HDFS-17233
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Hemanth Boyina
>Assignee: Palakur Eshwitha Sai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> {code:java}
> long confLifelineIntervalMs =
> getConf().getLong(DFS_DATANODE_LIFELINE_INTERVAL_SECONDS_KEY,
> 3 * getConf().getTimeDuration(DFS_HEARTBEAT_INTERVAL_KEY,
> DFS_HEARTBEAT_INTERVAL_DEFAULT, TimeUnit.SECONDS,
> TimeUnit.MILLISECONDS)); {code}
> if we configure DFS_DATANODE_LIFELINE_INTERVAL_SECONDS_KEY, the value is not 
> converting to Ms. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17233) The conf dfs.datanode.lifeline.interval.seconds is not considering time unit seconds

2023-12-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792346#comment-17792346
 ] 

Ayush Saxena commented on HDFS-17233:
-

Committed to trunk.

Thanx [~palsai] for the contribution & [~haiyang Hu] for the review!!!

> The conf dfs.datanode.lifeline.interval.seconds is not considering time unit 
> seconds
> 
>
> Key: HDFS-17233
> URL: https://issues.apache.org/jira/browse/HDFS-17233
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Reporter: Hemanth Boyina
>Assignee: Palakur Eshwitha Sai
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> long confLifelineIntervalMs =
> getConf().getLong(DFS_DATANODE_LIFELINE_INTERVAL_SECONDS_KEY,
> 3 * getConf().getTimeDuration(DFS_HEARTBEAT_INTERVAL_KEY,
> DFS_HEARTBEAT_INTERVAL_DEFAULT, TimeUnit.SECONDS,
> TimeUnit.MILLISECONDS)); {code}
> if we configure DFS_DATANODE_LIFELINE_INTERVAL_SECONDS_KEY, the value is not 
> converting to Ms. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17271) Web UI DN report shows random order when sorting with dead DNs

2023-12-02 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17271.
-
Resolution: Fixed

> Web UI DN report shows random order when sorting with dead DNs
> --
>
> Key: HDFS-17271
> URL: https://issues.apache.org/jira/browse/HDFS-17271
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, rbf, ui
>Affects Versions: 3.4.0
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-12-01-15-04-11-047.png
>
>
> When sorted by "last contact" in ascending order, dead nodes come up on top 
> in a random order
> !image-2023-12-01-15-04-11-047.png|width=337,height=263!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17271) Web UI DN report shows random order when sorting with dead DNs

2023-12-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792344#comment-17792344
 ] 

Ayush Saxena commented on HDFS-17271:
-

Committed to trunk.

Thanx [~coconut_icecream] for the contibution & [~slfan1989] for the review!!!

> Web UI DN report shows random order when sorting with dead DNs
> --
>
> Key: HDFS-17271
> URL: https://issues.apache.org/jira/browse/HDFS-17271
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namenode, rbf, ui
>Affects Versions: 3.4.0
>Reporter: Felix N
>Assignee: Felix N
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-12-01-15-04-11-047.png
>
>
> When sorted by "last contact" in ascending order, dead nodes come up on top 
> in a random order
> !image-2023-12-01-15-04-11-047.png|width=337,height=263!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17260) Fix the logic for reconfigure slow peer enable for Namenode.

2023-12-02 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-17260:
---

Assignee: huangzhaobo99

> Fix the logic for reconfigure slow peer enable for Namenode.
> 
>
> Key: HDFS-17260
> URL: https://issues.apache.org/jira/browse/HDFS-17260
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17261) RBF: Fix getFileInfo return wrong path when get mountTable path which multi-level

2023-12-01 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17792012#comment-17792012
 ] 

Ayush Saxena commented on HDFS-17261:
-

Committed to trunk.

Thanx [~liuguanghua] for the contribution & [~elgoiri] for the review!!!

> RBF: Fix getFileInfo return wrong path when get mountTable path which 
> multi-level
> -
>
> Key: HDFS-17261
> URL: https://issues.apache.org/jira/browse/HDFS-17261
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
>
> With DFSRouter, Suppose there are two nameservices : ns0,ns1
>  # Add mountTable      /testgetfileinfo/ns1/dir  -> (ns1 -> 
> /testgetfileinfo/ns1/dir) 
>  # hdfs client via DFSRouter accesses a directory:   hdfs dfs -ls -d 
> /testgetfileinfo
>  # it will return worng path :    /testgetfileinfo/testgetfileinfo
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17261) RBF: Fix getFileInfo return wrong path when get mountTable path which multi-level

2023-12-01 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17261.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> RBF: Fix getFileInfo return wrong path when get mountTable path which 
> multi-level
> -
>
> Key: HDFS-17261
> URL: https://issues.apache.org/jira/browse/HDFS-17261
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> With DFSRouter, Suppose there are two nameservices : ns0,ns1
>  # Add mountTable      /testgetfileinfo/ns1/dir  -> (ns1 -> 
> /testgetfileinfo/ns1/dir) 
>  # hdfs client via DFSRouter accesses a directory:   hdfs dfs -ls -d 
> /testgetfileinfo
>  # it will return worng path :    /testgetfileinfo/testgetfileinfo
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17261) RBF: Fix getFileInfo return wrong path when get mountTable path which multi-level

2023-12-01 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena reassigned HDFS-17261:
---

Assignee: liuguanghua

> RBF: Fix getFileInfo return wrong path when get mountTable path which 
> multi-level
> -
>
> Key: HDFS-17261
> URL: https://issues.apache.org/jira/browse/HDFS-17261
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Minor
>  Labels: pull-request-available
>
> With DFSRouter, Suppose there are two nameservices : ns0,ns1
>  # Add mountTable      /testgetfileinfo/ns1/dir  -> (ns1 -> 
> /testgetfileinfo/ns1/dir) 
>  # hdfs client via DFSRouter accesses a directory:   hdfs dfs -ls -d 
> /testgetfileinfo
>  # it will return worng path :    /testgetfileinfo/testgetfileinfo
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17259) Fix typo in TestFsDatasetImpl Class.

2023-11-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17259.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix typo in TestFsDatasetImpl Class.
> 
>
> Key: HDFS-17259
> URL: https://issues.apache.org/jira/browse/HDFS-17259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17259) Fix typo in TestFsDatasetImpl Class.

2023-11-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17259:

Priority: Trivial  (was: Minor)

> Fix typo in TestFsDatasetImpl Class.
> 
>
> Key: HDFS-17259
> URL: https://issues.apache.org/jira/browse/HDFS-17259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17259) Fix typo in TestFsDatasetImpl Class.

2023-11-30 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17791905#comment-17791905
 ] 

Ayush Saxena commented on HDFS-17259:
-

Committed to trunk.

Thanx [~huangzhaobo99] for the contribution & [~xinglin] for the review!!!

> Fix typo in TestFsDatasetImpl Class.
> 
>
> Key: HDFS-17259
> URL: https://issues.apache.org/jira/browse/HDFS-17259
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: huangzhaobo99
>Assignee: huangzhaobo99
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17129) mis-order of ibr and fbr on datanode

2023-11-30 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17129:

Affects Version/s: 3.3.6
   3.3.9
 Priority: Blocker  (was: Major)

> mis-order of ibr and fbr on datanode 
> -
>
> Key: HDFS-17129
> URL: https://issues.apache.org/jira/browse/HDFS-17129
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0, 3.3.9, 3.3.6
> Environment: hdfs3.4.0
>Reporter: liuguanghua
>Assignee: liuguanghua
>Priority: Blocker
>  Labels: pull-request-available
>
> HDFS-16016 , provide new thread to handler IBR. That is a greate improvement. 
> But it maybe casue the mis-order of ibr and fbr



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17251) RBF: Optimize MountTableResolver#TRASH_PATTERN

2023-11-14 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17251:

Summary: RBF: Optimize MountTableResolver#TRASH_PATTERN  (was: Optimize 
MountTableResolver#TRASH_PATTERN)

> RBF: Optimize MountTableResolver#TRASH_PATTERN
> --
>
> Key: HDFS-17251
> URL: https://issues.apache.org/jira/browse/HDFS-17251
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.6
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
>
> We should make the length of date string of MountTableResolver#TRASH_PATTERN 
> have fixed length.
> because the trash dirs look like below pattern:
> /user/hdfs/.Trash/231113002000
> the data string has a fixed length of 12.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17246) Fix DFSUtilClient.ValidName ERROR

2023-11-03 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782602#comment-17782602
 ] 

Ayush Saxena commented on HDFS-17246:
-

[~gaurava] If the path to be referenced in on {{LocalFileSystem}} then these 
things will surface and behaviour differences will kick in wrt Unix.

If the path is on HDFS, we can have any path, that need not be on LFS. If you 
check HDFS-13296, it also solved a similar problem, if you read the description.

For WebHDFS, it was having a path like 
{{"webhdfs://127.0.0.1:18334/D:/target/test/data/vUqZkOrBZa/test"}}, it has 
D:/, but this wasn't required & it got fixed by that util.

If you check this 
method(https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeAttributeTestUtils.java#L45-L50),
 it is used by several other tests & If those tests are passing on Windows, 
then it should fix.
Unless that {{getCanonicalPath()}} screws up something, In that case we may try 

{code:java}
conf.set(YarnConfiguration.FS_NODE_ATTRIBUTE_STORE_ROOT_DIR,GenericTestUtils.getRandomizedTestDir().getAbsolutePath());
{code}

Let me know if it still fails & the test you ran, if no luck we can go ahead 
and just change the DFSUtils as you initially proposed


> Fix DFSUtilClient.ValidName ERROR
> -
>
> Key: HDFS-17246
> URL: https://issues.apache.org/jira/browse/HDFS-17246
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.4.0
> Environment: Windows 10
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-11-03-17-31-14-990.png
>
>
> Currently, the *shaded client* Yetus personality in Hadoop fails to build on 
> Windows - 
> https://github.com/apache/hadoop/blob/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/dev-support/bin/hadoop.sh#L541-L615.
> This happens due to the integration test failures in Hadoop client modules - 
> https://github.com/apache/hadoop/tree/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/hadoop-client-modules/hadoop-client-integration-tests.
> There are several issues that need to be addressed in order to get the 
> integration tests working -
> # Set the HADOOP_HOME, needed by the Mini DFS and YARN clusters spawned by 
> the integration tests.
> # Add Hadoop binaries to PATH, so that winutils.exe can be located.
> # Create a new user with Symlink privilege in the Docker image. This is 
> needed for the proper working of Mini YARN cluster, spawned by the 
> integration tests.
> # Fix a bug in DFSUtilClient.java that prevents colon ( *:* ) in the path. 
> The colon is used a delimiter for the PATH variable while specifying multiple 
> paths. However, this isn't a delimiter in the case of Windows and must be 
> handled appropriately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-17246) Fix shaded client for building Hadoop on Windows

2023-11-03 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782567#comment-17782567
 ] 

Ayush Saxena edited comment on HDFS-17246 at 11/3/23 1:22 PM:
--

Thanx [~gaurava] for the pointers. I think that code which is creating trouble 
in windows was added as part of YARN-9568, which seems to have chosen a path 
which is not HDFS compatible on Windows.

I think there was a similar problem solved as part of HDFS-13296.

Can you use the same util and see if the test works?
{noformat}
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
index 6472a21f961..49370d801bb 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
@@ -103,6 +103,7 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import static org.apache.hadoop.test.GenericTestUtils.getTempPath;
 import static 
org.apache.hadoop.yarn.server.resourcemanager.resource.TestResourceProfiles.TEST_CONF_RESET_RESOURCE_TYPES;
 
 /**
@@ -336,7 +337,7 @@ public void serviceInit(Configuration conf) throws 
Exception {
 // to ensure that any FileSystemNodeAttributeStore started by the RM always
 // uses a unique path, if unset, force it under the test dir.
 if (conf.get(YarnConfiguration.FS_NODE_ATTRIBUTE_STORE_ROOT_DIR) == null) {
-  File nodeAttrDir = new File(getTestWorkDir(), "nodeattributes");
+  File nodeAttrDir = new File(getTempPath("nodeattributes"));
   conf.set(YarnConfiguration.FS_NODE_ATTRIBUTE_STORE_ROOT_DIR,
   nodeAttrDir.getCanonicalPath());
 }

{noformat}
I think in the current fix, You are making HDFS allow {{:}} at first index in 
case of Windows, I think we can change the path itself, rather than changing 
the HDFS logic.

 

{{NodeAttributeTestUtils}} also lands up to that new method ultimately which 
handles windows path for that conf here:

[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeAttributeTestUtils.java#L45-L50]

 

We can also try \{{GenericTestUtils.getRandomizedTestDir()}}, if that 
"nodeattributes" isn't mandatory in the path name


was (Author: ayushtkn):
Thanx [~gaurava] for the pointers. I think that code which is creating trouble 
in windows was added as part of YARN-9568, which seems to have chosen a path 
which is not HDFS compatible on Windows.

I think there was a similar problem solved as part of HDFS-13296.

Can you use the same util and see if the test works?
{noformat}
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
index 6472a21f961..49370d801bb 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
@@ -103,6 +103,7 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import static org.apache.hadoop.test.GenericTestUtils.getTempPath;
 import static 
org.apache.hadoop.yarn.server.resourcemanager.resource.TestResourceProfiles.TEST_CONF_RESET_RESOURCE_TYPES;
 
 /**
@@ -336,7 +337,7 @@ public void serviceInit(Configuration conf) throws 
Exception {
 // to ensure that any FileSystemNodeAttributeStore started by the RM always
 // uses a unique path, if unset, force it under the test dir.
 if (conf.get(YarnConfiguration.FS_NODE_ATTRIBUTE_STORE_ROOT_DIR) == null) {
-  File nodeAttrDir = new File(getTestWorkDir(), "nodeattributes");
+  File nodeAttrDir = new File(getTempPath("nodeattributes"));
   conf.set(YarnConfiguration.FS_NODE_ATTRIBUTE_STORE_ROOT_DIR,
   nodeAttrDir.getCanonicalPath());
 }

{noformat}
I think in the current fix, You are making HDFS allow {{:}} at first index in 
case of Windows, I think we can change the path itself, rather than changing 
the HDFS logic.

 

{{NodeAttributeTestUtils}} also lands up to that new method ultimately which 
handles windows path for that conf here:


[jira] [Commented] (HDFS-17246) Fix shaded client for building Hadoop on Windows

2023-11-03 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782567#comment-17782567
 ] 

Ayush Saxena commented on HDFS-17246:
-

Thanx [~gaurava] for the pointers. I think that code which is creating trouble 
in windows was added as part of YARN-9568, which seems to have chosen a path 
which is not HDFS compatible on Windows.

I think there was a similar problem solved as part of HDFS-13296.

Can you use the same util and see if the test works?
{noformat}
diff --git 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
index 6472a21f961..49370d801bb 100644
--- 
a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
+++ 
b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-tests/src/test/java/org/apache/hadoop/yarn/server/MiniYARNCluster.java
@@ -103,6 +103,7 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import static org.apache.hadoop.test.GenericTestUtils.getTempPath;
 import static 
org.apache.hadoop.yarn.server.resourcemanager.resource.TestResourceProfiles.TEST_CONF_RESET_RESOURCE_TYPES;
 
 /**
@@ -336,7 +337,7 @@ public void serviceInit(Configuration conf) throws 
Exception {
 // to ensure that any FileSystemNodeAttributeStore started by the RM always
 // uses a unique path, if unset, force it under the test dir.
 if (conf.get(YarnConfiguration.FS_NODE_ATTRIBUTE_STORE_ROOT_DIR) == null) {
-  File nodeAttrDir = new File(getTestWorkDir(), "nodeattributes");
+  File nodeAttrDir = new File(getTempPath("nodeattributes"));
   conf.set(YarnConfiguration.FS_NODE_ATTRIBUTE_STORE_ROOT_DIR,
   nodeAttrDir.getCanonicalPath());
 }

{noformat}
I think in the current fix, You are making HDFS allow {{:}} at first index in 
case of Windows, I think we can change the path itself, rather than changing 
the HDFS logic.

 

{{NodeAttributeTestUtils}} also lands up to that new method ultimately which 
handles windows path for that conf here:

[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/java/org/apache/hadoop/yarn/server/resourcemanager/NodeAttributeTestUtils.java#L45-L50]

> Fix shaded client for building Hadoop on Windows
> 
>
> Key: HDFS-17246
> URL: https://issues.apache.org/jira/browse/HDFS-17246
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.4.0
> Environment: Windows 10
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-11-03-17-31-14-990.png
>
>
> Currently, the *shaded client* Yetus personality in Hadoop fails to build on 
> Windows - 
> https://github.com/apache/hadoop/blob/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/dev-support/bin/hadoop.sh#L541-L615.
> This happens due to the integration test failures in Hadoop client modules - 
> https://github.com/apache/hadoop/tree/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/hadoop-client-modules/hadoop-client-integration-tests.
> There are several issues that need to be addressed in order to get the 
> integration tests working -
> # Set the HADOOP_HOME, needed by the Mini DFS and YARN clusters spawned by 
> the integration tests.
> # Add Hadoop binaries to PATH, so that winutils.exe can be located.
> # Create a new user with Symlink privilege in the Docker image. This is 
> needed for the proper working of Mini YARN cluster, spawned by the 
> integration tests.
> # Fix a bug in DFSUtilClient.java that prevents colon ( *:* ) in the path. 
> The colon is used a delimiter for the PATH variable while specifying multiple 
> paths. However, this isn't a delimiter in the case of Windows and must be 
> handled appropriately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17246) Fix shaded client for building Hadoop on Windows

2023-11-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782357#comment-17782357
 ] 

Ayush Saxena commented on HDFS-17246:
-

Yes, if the modifications are only in hdfs-client it just runs the hdfs-client 
tests. We just run the tests of the modified modules

> Fix shaded client for building Hadoop on Windows
> 
>
> Key: HDFS-17246
> URL: https://issues.apache.org/jira/browse/HDFS-17246
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.4.0
> Environment: Windows 10
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Currently, the *shaded client* Yetus personality in Hadoop fails to build on 
> Windows - 
> https://github.com/apache/hadoop/blob/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/dev-support/bin/hadoop.sh#L541-L615.
> This happens due to the integration test failures in Hadoop client modules - 
> https://github.com/apache/hadoop/tree/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/hadoop-client-modules/hadoop-client-integration-tests.
> There are several issues that need to be addressed in order to get the 
> integration tests working -
> # Set the HADOOP_HOME, needed by the Mini DFS and YARN clusters spawned by 
> the integration tests.
> # Add Hadoop binaries to PATH, so that winutils.exe can be located.
> # Create a new user with Symlink privilege in the Docker image. This is 
> needed for the proper working of Mini YARN cluster, spawned by the 
> integration tests.
> # Fix a bug in DFSUtilClient.java that prevents colon ( *:* ) in the path. 
> The colon is used a delimiter for the PATH variable while specifying multiple 
> paths. However, this isn't a delimiter in the case of Windows and must be 
> handled appropriately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17246) Fix shaded client for building Hadoop on Windows

2023-11-02 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17782337#comment-17782337
 ] 

Ayush Saxena commented on HDFS-17246:
-

This is breaking TestDFSUtil#testIsValidName
[https://ci-hadoop.apache.org/view/Hadoop/job/hadoop-qbt-trunk-java8-linux-x86_64/1398/testReport/junit/org.apache.hadoop.hdfs/TestDFSUtil/testIsValidName/]

I think the if check is wrong, there is a missing bracket I believe
{noformat}
diff --git 
a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java
 
b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java
index 71cff2e3915..a3d14cac03f 100644
--- 
a/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java
+++ 
b/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSUtilClient.java
@@ -663,7 +663,7 @@ public static boolean isValidName(String src) {
   String element = components[i];
   if (element.equals(".")  ||
   // For Windows, we must allow the : in the drive letter.
-  (!Shell.WINDOWS && i == 1 && element.contains(":"))  ||
+  (!(Shell.WINDOWS && i == 1) && element.contains(":"))  ||
   (element.contains("/"))) {
 return false;
   }

{noformat}
btw. Why is this creating trouble? This util should be kicking in for DFS paths 
right? & in HDFS we would be having HDFS Namespace, How Windows kicked in?


Second, A curious question. When talking to Namenode if my path has colon at 
index:1 & I am talking to Namenode via windows Shell, it will allow me to 
create the path & while doing so via Linux Shell it won't? Is that expected or 
is there any catch here?

cc. [~gaurava]/ [~elgoiri]

> Fix shaded client for building Hadoop on Windows
> 
>
> Key: HDFS-17246
> URL: https://issues.apache.org/jira/browse/HDFS-17246
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs-client
>Affects Versions: 3.4.0
> Environment: Windows 10
>Reporter: Gautham Banasandra
>Assignee: Gautham Banasandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Currently, the *shaded client* Yetus personality in Hadoop fails to build on 
> Windows - 
> https://github.com/apache/hadoop/blob/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/dev-support/bin/hadoop.sh#L541-L615.
> This happens due to the integration test failures in Hadoop client modules - 
> https://github.com/apache/hadoop/tree/4c04a6768c0cb3d5081cfa5d84ffb389d92f5805/hadoop-client-modules/hadoop-client-integration-tests.
> There are several issues that need to be addressed in order to get the 
> integration tests working -
> # Set the HADOOP_HOME, needed by the Mini DFS and YARN clusters spawned by 
> the integration tests.
> # Add Hadoop binaries to PATH, so that winutils.exe can be located.
> # Create a new user with Symlink privilege in the Docker image. This is 
> needed for the proper working of Mini YARN cluster, spawned by the 
> integration tests.
> # Fix a bug in DFSUtilClient.java that prevents colon ( *:* ) in the path. 
> The colon is used a delimiter for the PATH variable while specifying multiple 
> paths. However, this isn't a delimiter in the case of Windows and must be 
> handled appropriately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17235) Fix javadoc errors in BlockManager

2023-10-23 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17235.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix javadoc errors in BlockManager
> --
>
> Key: HDFS-17235
> URL: https://issues.apache.org/jira/browse/HDFS-17235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> There are 2 errors in BlockManager.java
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6194/4/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04.txt
> {code:java}
> [ERROR] 
> /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-6194/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:153:
>  error: reference not found
> [ERROR]  * by {@link DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY}. This 
> number has to =
> [ERROR]  ^
> [ERROR] 
> /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-6194/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:154:
>  error: reference not found
> [ERROR]  * {@link DFS_NAMENODE_REPLICATION_MIN_KEY}.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17235) Fix javadoc errors in BlockManager

2023-10-23 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778697#comment-17778697
 ] 

Ayush Saxena commented on HDFS-17235:
-

Committed to trunk.

Thanx [~haiyang Hu] for the contribution!!!

> Fix javadoc errors in BlockManager
> --
>
> Key: HDFS-17235
> URL: https://issues.apache.org/jira/browse/HDFS-17235
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> There are 2 errors in BlockManager.java
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6194/4/artifact/out/patch-javadoc-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.20.1+1-post-Ubuntu-0ubuntu120.04.txt
> {code:java}
> [ERROR] 
> /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-6194/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:153:
>  error: reference not found
> [ERROR]  * by {@link DFS_NAMENODE_MAINTENANCE_REPLICATION_MIN_KEY}. This 
> number has to =
> [ERROR]  ^
> [ERROR] 
> /home/jenkins/jenkins-agent/workspace/hadoop-multibranch_PR-6194/ubuntu-focal/src/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java:154:
>  error: reference not found
> [ERROR]  * {@link DFS_NAMENODE_REPLICATION_MIN_KEY}.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17228) Improve documentation related to BlockManager

2023-10-17 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HDFS-17228.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Improve documentation related to BlockManager
> -
>
> Key: HDFS-17228
> URL: https://issues.apache.org/jira/browse/HDFS-17228
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement, documentation
>Affects Versions: 3.3.3, 3.3.6
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
> Attachments: image-2023-10-17-17-25-27-363.png
>
>
> In the BlockManager file, some important comments are missing.
> Happens here:
>  !image-2023-10-17-17-25-27-363.png! 
> If it is improved, the robustness of the distributed system can be increased.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17228) Improve documentation related to BlockManager

2023-10-17 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776430#comment-17776430
 ] 

Ayush Saxena commented on HDFS-17228:
-

Committed to trunk.

Thanx [~jianghuazhu] for the contribution & [~elgoiri] for the review!!!

> Improve documentation related to BlockManager
> -
>
> Key: HDFS-17228
> URL: https://issues.apache.org/jira/browse/HDFS-17228
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement, documentation
>Affects Versions: 3.3.3, 3.3.6
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-10-17-17-25-27-363.png
>
>
> In the BlockManager file, some important comments are missing.
> Happens here:
>  !image-2023-10-17-17-25-27-363.png! 
> If it is improved, the robustness of the distributed system can be increased.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17228) Improve documentation related to BlockManager

2023-10-17 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena updated HDFS-17228:

Summary: Improve documentation related to BlockManager  (was: Add 
documentation related to BlockManager)

> Improve documentation related to BlockManager
> -
>
> Key: HDFS-17228
> URL: https://issues.apache.org/jira/browse/HDFS-17228
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: block placement, documentation
>Affects Versions: 3.3.3, 3.3.6
>Reporter: JiangHua Zhu
>Assignee: JiangHua Zhu
>Priority: Minor
>  Labels: pull-request-available
> Attachments: image-2023-10-17-17-25-27-363.png
>
>
> In the BlockManager file, some important comments are missing.
> Happens here:
>  !image-2023-10-17-17-25-27-363.png! 
> If it is improved, the robustness of the distributed system can be increased.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17225) Fix TestNameNodeMXBean#testDecommissioningNodes

2023-10-14 Thread Ayush Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17775172#comment-17775172
 ] 

Ayush Saxena commented on HDFS-17225:
-

Fails in assertion, for 

{{{}decommissionDuration{}}}, asserting the duration seems to be a bad idea, we 
fetch the value from Mbean then from Namenode directly, if we have latency b/w 
both calls, this duration will always be different.
Easy to reproduce:
Put a sleep
{code:java}
   Map> decomNodes =
   (Map>) JSON.parse(decomNodesInfo);
+  Thread.sleep(2000);
   assertEquals(fsn.getDecomNodes(), decomNodesInfo);
   assertEquals(fsn.getNumDecommissioningDataNodes(), decomNodes.size());
{code}
I think we should remove {{decommissionDuration}} from the comparison. If there 
is some Json util which removes it from the String, that can be used. Else we 
already have a map above.
Change the code like this:
{code:java}
  // Remove decommissionDuration to avoid flakiness
  decomNodes.values().forEach(x -> x.remove("decommissionDuration"));
  Map> decomNodesFsn =
  (Map>) JSON.parse(fsn.getDecomNodes());
  decomNodesFsn.values().forEach(x -> x.remove("decommissionDuration"));
  assertEquals(decomNodesFsn, decomNodes);
{code}
And remove this {{assertEquals(fsn.getDecomNodes(), decomNodesInfo);}}

> Fix TestNameNodeMXBean#testDecommissioningNodes
> ---
>
> Key: HDFS-17225
> URL: https://issues.apache.org/jira/browse/HDFS-17225
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Ayush Saxena
>Priority: Major
>
> Fails in assertion
> {noformat}
> org.junit.ComparisonFailure: expected:<...commissionDuration":[2]}}> but 
> was:<...commissionDuration":[1]}}>
>   at org.junit.Assert.assertEquals(Assert.java:117)
>   at org.junit.Assert.assertEquals(Assert.java:146)
>   at 
> org.apache.hadoop.hdfs.server.namenode.TestNameNodeMXBean.testDecommissioningNodes(TestNameNodeMXBean.java:432){noformat}
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6185/1/testReport/org.apache.hadoop.hdfs.server.namenode/TestNameNodeMXBean/testDecommissioningNodes/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >