[jira] [Commented] (HDFS-17523) Add fine-grained locks metrics in DataSetLockManager

2024-05-14 Thread Xiaoqiao He (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846482#comment-17846482
 ] 

Xiaoqiao He commented on HDFS-17523:


Good point. Would you mind to submit PR for this proposal? Thanks.

> Add  fine-grained locks metrics in DataSetLockManager
> -
>
> Key: HDFS-17523
> URL: https://issues.apache.org/jira/browse/HDFS-17523
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: lei w
>Priority: Major
>
> Currently we use fine-grained locks to manage FsDataSetImpl. But we did not 
> add lock-related metrics. In some cases, we actually need lock-holding 
> information to understand the time-consuming lock-holding of a certain 
> operation. Using this information, we can also optimize some long-term lock 
> operations as early as possible.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17520) TestDFSAdmin.testAllDatanodesReconfig and TestDFSAdmin.testDecommissionDataNodesReconfig failed

2024-05-14 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved HDFS-17520.
---
   Fix Version/s: 3.4.1
  3.5.0
Hadoop Flags: Reviewed
Target Version/s: 3.4.1, 3.5.0
  Resolution: Fixed

> TestDFSAdmin.testAllDatanodesReconfig and 
> TestDFSAdmin.testDecommissionDataNodesReconfig failed
> ---
>
> Key: HDFS-17520
> URL: https://issues.apache.org/jira/browse/HDFS-17520
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> {code:java}
> [ERROR] Tests run: 21, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 
> 44.521 s <<< FAILURE! - in org.apache.hadoop.hdfs.tools.TestDFSAdmin
> [ERROR] testAllDatanodesReconfig(org.apache.hadoop.hdfs.tools.TestDFSAdmin)  
> Time elapsed: 2.086 s  <<< FAILURE!
> java.lang.AssertionError: 
> Expecting:
>  <["Reconfiguring status for node [127.0.0.1:43731]: SUCCESS: Changed 
> property dfs.datanode.peer.stats.enabled",
> " From: "false"",
> " To: "true"",
> "started at Fri May 10 13:02:51 UTC 2024 and finished at Fri May 10 
> 13:02:51 UTC 2024."]>
> to contain subsequence:
>  <["SUCCESS: Changed property dfs.datanode.peer.stats.enabled",
> " From: "false"",
> " To: "true""]>
>   at 
> org.apache.hadoop.hdfs.tools.TestDFSAdmin.testAllDatanodesReconfig(TestDFSAdmin.java:1286)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17520) TestDFSAdmin.testAllDatanodesReconfig and TestDFSAdmin.testDecommissionDataNodesReconfig failed

2024-05-14 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17520:
--
Affects Version/s: 3.4.0

> TestDFSAdmin.testAllDatanodesReconfig and 
> TestDFSAdmin.testDecommissionDataNodesReconfig failed
> ---
>
> Key: HDFS-17520
> URL: https://issues.apache.org/jira/browse/HDFS-17520
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> {code:java}
> [ERROR] Tests run: 21, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 
> 44.521 s <<< FAILURE! - in org.apache.hadoop.hdfs.tools.TestDFSAdmin
> [ERROR] testAllDatanodesReconfig(org.apache.hadoop.hdfs.tools.TestDFSAdmin)  
> Time elapsed: 2.086 s  <<< FAILURE!
> java.lang.AssertionError: 
> Expecting:
>  <["Reconfiguring status for node [127.0.0.1:43731]: SUCCESS: Changed 
> property dfs.datanode.peer.stats.enabled",
> " From: "false"",
> " To: "true"",
> "started at Fri May 10 13:02:51 UTC 2024 and finished at Fri May 10 
> 13:02:51 UTC 2024."]>
> to contain subsequence:
>  <["SUCCESS: Changed property dfs.datanode.peer.stats.enabled",
> " From: "false"",
> " To: "true""]>
>   at 
> org.apache.hadoop.hdfs.tools.TestDFSAdmin.testAllDatanodesReconfig(TestDFSAdmin.java:1286)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17520) TestDFSAdmin.testAllDatanodesReconfig and TestDFSAdmin.testDecommissionDataNodesReconfig failed

2024-05-14 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan updated HDFS-17520:
--
Component/s: hdfs

> TestDFSAdmin.testAllDatanodesReconfig and 
> TestDFSAdmin.testDecommissionDataNodesReconfig failed
> ---
>
> Key: HDFS-17520
> URL: https://issues.apache.org/jira/browse/HDFS-17520
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Affects Versions: 3.4.0
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, 3.5.0
>
>
> {code:java}
> [ERROR] Tests run: 21, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 
> 44.521 s <<< FAILURE! - in org.apache.hadoop.hdfs.tools.TestDFSAdmin
> [ERROR] testAllDatanodesReconfig(org.apache.hadoop.hdfs.tools.TestDFSAdmin)  
> Time elapsed: 2.086 s  <<< FAILURE!
> java.lang.AssertionError: 
> Expecting:
>  <["Reconfiguring status for node [127.0.0.1:43731]: SUCCESS: Changed 
> property dfs.datanode.peer.stats.enabled",
> " From: "false"",
> " To: "true"",
> "started at Fri May 10 13:02:51 UTC 2024 and finished at Fri May 10 
> 13:02:51 UTC 2024."]>
> to contain subsequence:
>  <["SUCCESS: Changed property dfs.datanode.peer.stats.enabled",
> " From: "false"",
> " To: "true""]>
>   at 
> org.apache.hadoop.hdfs.tools.TestDFSAdmin.testAllDatanodesReconfig(TestDFSAdmin.java:1286)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17520) TestDFSAdmin.testAllDatanodesReconfig and TestDFSAdmin.testDecommissionDataNodesReconfig failed

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846464#comment-17846464
 ] 

ASF GitHub Bot commented on HDFS-17520:
---

slfan1989 merged PR #6812:
URL: https://github.com/apache/hadoop/pull/6812




> TestDFSAdmin.testAllDatanodesReconfig and 
> TestDFSAdmin.testDecommissionDataNodesReconfig failed
> ---
>
> Key: HDFS-17520
> URL: https://issues.apache.org/jira/browse/HDFS-17520
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> [ERROR] Tests run: 21, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 
> 44.521 s <<< FAILURE! - in org.apache.hadoop.hdfs.tools.TestDFSAdmin
> [ERROR] testAllDatanodesReconfig(org.apache.hadoop.hdfs.tools.TestDFSAdmin)  
> Time elapsed: 2.086 s  <<< FAILURE!
> java.lang.AssertionError: 
> Expecting:
>  <["Reconfiguring status for node [127.0.0.1:43731]: SUCCESS: Changed 
> property dfs.datanode.peer.stats.enabled",
> " From: "false"",
> " To: "true"",
> "started at Fri May 10 13:02:51 UTC 2024 and finished at Fri May 10 
> 13:02:51 UTC 2024."]>
> to contain subsequence:
>  <["SUCCESS: Changed property dfs.datanode.peer.stats.enabled",
> " From: "false"",
> " To: "true""]>
>   at 
> org.apache.hadoop.hdfs.tools.TestDFSAdmin.testAllDatanodesReconfig(TestDFSAdmin.java:1286)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>   at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
>   at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
>   at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:273)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:238)
>   at 
> org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:159)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:384)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:345)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:126)
>   at 
> org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:418) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17528) FsImageValidation: set txid when saving a new image

2024-05-14 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-17528:
--
Labels: pull-request-available  (was: )

> FsImageValidation: set txid when saving a new image
> ---
>
> Key: HDFS-17528
> URL: https://issues.apache.org/jira/browse/HDFS-17528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>
> - When the fsimage is specified as a file and the FsImageValidation tool 
> saves a new image (for removing inaccessible inodes), the txid is not set.  
> Then, the resulted image will have 0 as its txid.
> - When the fsimage is specified as a directory, the txid is set.  However, it 
> will get NPE since NameNode metrics is uninitialized (although the metrics is 
> not used by FsImageValidation).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17528) FsImageValidation: set txid when saving a new image

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846440#comment-17846440
 ] 

ASF GitHub Bot commented on HDFS-17528:
---

szetszwo opened a new pull request, #6828:
URL: https://github.com/apache/hadoop/pull/6828

   ### Description of PR
   
   - When the fsimage is specified as a file and the FsImageValidation tool 
saves a new image (for removing inaccessible inodes), the txid is not set. 
Then, the resulted image will have 0 as its txid.
   - When the fsimage is specified as a directory, the txid is set. However, it 
will get NPE since NameNode metrics is uninitialized (although the metrics is 
not used by FsImageValidation).
   
   ### How was this patch tested?
   
   Tested manually
   - before: the output file is `fsimage.ckpt_000` (i.e. txid 
is 0)
   > 2024-05-14 13:37:27,531 [main] INFO  namenode.FSImageFormatProtobuf 
(FSImageFormatProtobuf.java:save(732)) - Saving image file 
/Users/szetszwo/hadoop/fsimage/current/newFsImage5968764763996132609/current/fsimage.ckpt_000
 using no compression
   > 2024-05-14 13:37:30,522 [main] INFO  namenode.FSImageFormatProtobuf 
(FSImageFormatProtobuf.java:save(736)) - Image file 
/Users/szetszwo/hadoop/fsimage/current/newFsImage5968764763996132609/current/fsimage.ckpt_000
 of size 200392059 bytes saved in 2 seconds .
   
   - after: the output file is `fsimage.ckpt_23945925442` with correct 
txid
   > 2024-05-14 13:38:32,414 [main] INFO  namenode.FSImage 
(FSImage.java:save(1223)) - save fsimage with txid=23945925442 to 
/Users/szetszwo/hadoop/fsimage/current/newFsImage4409944859316006440
   > 2024-05-14 13:38:32,436 [main] INFO  namenode.FSImageFormatProtobuf 
(FSImageFormatProtobuf.java:save(732)) - Saving image file 
/Users/szetszwo/hadoop/fsimage/current/newFsImage4409944859316006440/current/fsimage.ckpt_23945925442
 using no compression
   > 2024-05-14 13:38:35,437 [main] INFO  namenode.FSImageFormatProtobuf 
(FSImageFormatProtobuf.java:save(736)) - Image file 
/Users/szetszwo/hadoop/fsimage/current/newFsImage4409944859316006440/current/fsimage.ckpt_23945925442
 of size 200392062 bytes saved in 3 seconds .
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [NA] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [NA] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [NA] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   




> FsImageValidation: set txid when saving a new image
> ---
>
> Key: HDFS-17528
> URL: https://issues.apache.org/jira/browse/HDFS-17528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>
> - When the fsimage is specified as a file and the FsImageValidation tool 
> saves a new image (for removing inaccessible inodes), the txid is not set.  
> Then, the resulted image will have 0 as its txid.
> - When the fsimage is specified as a directory, the txid is set.  However, it 
> will get NPE since NameNode metrics is uninitialized (although the metrics is 
> not used by FsImageValidation).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17528) FsImageValidation: set txid when saving a new image

2024-05-14 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDFS-17528:
--
Issue Type: Bug  (was: Improvement)

> FsImageValidation: set txid when saving a new image
> ---
>
> Key: HDFS-17528
> URL: https://issues.apache.org/jira/browse/HDFS-17528
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>
> - When the fsimage is specified as a file and the FsImageValidation tool 
> saves a new image (for removing inaccessible inodes), the txid is not set.  
> Then, the resulted image will have 0 as its txid.
> - When the fsimage is specified as a directory, the txid is set.  However, it 
> will get NPE since NameNode metrics is uninitialized (although the metrics is 
> not used by FsImageValidation).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17528) FsImageValidation: set txid when saving a new image

2024-05-14 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze reassigned HDFS-17528:
-

Component/s: tools
   Assignee: Tsz-wo Sze
Description: 
- When the fsimage is specified as a file and the FsImageValidation tool saves 
a new image (for removing inaccessible inodes), the txid is not set.  Then, the 
resulted image will have 0 as its txid.

- When the fsimage is specified as a directory, the txid is set.  However, it 
will get NPE since NameNode metrics is uninitialized (although the metrics is 
not used by FsImageValidation).

> FsImageValidation: set txid when saving a new image
> ---
>
> Key: HDFS-17528
> URL: https://issues.apache.org/jira/browse/HDFS-17528
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: tools
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>
> - When the fsimage is specified as a file and the FsImageValidation tool 
> saves a new image (for removing inaccessible inodes), the txid is not set.  
> Then, the resulted image will have 0 as its txid.
> - When the fsimage is specified as a directory, the txid is set.  However, it 
> will get NPE since NameNode metrics is uninitialized (although the metrics is 
> not used by FsImageValidation).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17528) FsImageValidation: set txid when saving a new image

2024-05-14 Thread Tsz-wo Sze (Jira)
Tsz-wo Sze created HDFS-17528:
-

 Summary: FsImageValidation: set txid when saving a new image
 Key: HDFS-17528
 URL: https://issues.apache.org/jira/browse/HDFS-17528
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Tsz-wo Sze






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17527) RBF: Routers should not allow observer reads when namenode stateId context is disabled

2024-05-14 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira reassigned HDFS-17527:
-

Assignee: Jian Zhang  (was: Simbarashe Dzinamarira)

> RBF: Routers should not allow observer reads when namenode stateId context is 
> disabled
> --
>
> Key: HDFS-17527
> URL: https://issues.apache.org/jira/browse/HDFS-17527
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Simbarashe Dzinamarira
>Assignee: Jian Zhang
>Priority: Major
>
> HDFS-17514 addressed the case when state ID context is first enabled and then 
> disabled. However, if state Id is never enabled at all, there should be no 
> observer reads.
> Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
> context but there are still observer reads. 
> The solution to this is to not advance the shareGlobalStateID in 
> PoolAlignmentContext when the namenode returns a values of zero in the 
> RpcResponseHeader. Zero indicates that stateIdContext is disabled and should 
> not be treated as a valid state ID value. Note, fixing this will require 
> adjusting the unit tests as well.
> A further optimization related to HDFS-17514 is that when sharedGlobalStateId 
> and poolLocalStateId have been reset, we also should not allow 
> poolLocalStateId to be advanced by clients until the sharedGlobalStateId has 
> been advanced. This will protect existing clients from using a stale ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17527) RBF: Routers should not allow observer reads when namenode stateId context is disabled

2024-05-14 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira reassigned HDFS-17527:
-

Assignee: Simbarashe Dzinamarira

> RBF: Routers should not allow observer reads when namenode stateId context is 
> disabled
> --
>
> Key: HDFS-17527
> URL: https://issues.apache.org/jira/browse/HDFS-17527
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>
> HDFS-17514 addressed the case when state ID context is first enabled and then 
> disabled. However, if state Id is never enabled at all, there should be no 
> observer reads.
> Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
> context but there are still observer reads. 
> The solution to this is to not advance the shareGlobalStateID in 
> PoolAlignmentContext when the namenode returns a values of zero in the 
> RpcResponseHeader. Zero indicates that stateIdContext is disabled and should 
> not be treated as a valid state ID value. Note, fixing this will require 
> adjusting the unit tests as well.
> A further optimization related to HDFS-17514 is that when sharedGlobalStateId 
> and poolLocalStateId have been reset, we also should not allow 
> poolLocalStateId to be advanced by clients until the sharedGlobalStateId has 
> been advanced. This will protect existing clients from using a stale ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17527) RBF: Routers should not allow observer reads when namenode stateId context is disabled

2024-05-14 Thread Simbarashe Dzinamarira (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846381#comment-17846381
 ] 

Simbarashe Dzinamarira commented on HDFS-17527:
---

Great, I'll assign it to you. Thanks for taking it up.

> RBF: Routers should not allow observer reads when namenode stateId context is 
> disabled
> --
>
> Key: HDFS-17527
> URL: https://issues.apache.org/jira/browse/HDFS-17527
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Simbarashe Dzinamarira
>Priority: Major
>
> HDFS-17514 addressed the case when state ID context is first enabled and then 
> disabled. However, if state Id is never enabled at all, there should be no 
> observer reads.
> Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
> context but there are still observer reads. 
> The solution to this is to not advance the shareGlobalStateID in 
> PoolAlignmentContext when the namenode returns a values of zero in the 
> RpcResponseHeader. Zero indicates that stateIdContext is disabled and should 
> not be treated as a valid state ID value. Note, fixing this will require 
> adjusting the unit tests as well.
> A further optimization related to HDFS-17514 is that when sharedGlobalStateId 
> and poolLocalStateId have been reset, we also should not allow 
> poolLocalStateId to be advanced by clients until the sharedGlobalStateId has 
> been advanced. This will protect existing clients from using a stale ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17527) RBF: Routers should not allow observer reads when namenode stateId context is disabled

2024-05-14 Thread Jian Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846378#comment-17846378
 ] 

Jian Zhang commented on HDFS-17527:
---

[~simbadzina] hi,I can work on the issues, if no one is currently working on 
it, you can assign it to me.

> RBF: Routers should not allow observer reads when namenode stateId context is 
> disabled
> --
>
> Key: HDFS-17527
> URL: https://issues.apache.org/jira/browse/HDFS-17527
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Simbarashe Dzinamarira
>Priority: Major
>
> HDFS-17514 addressed the case when state ID context is first enabled and then 
> disabled. However, if state Id is never enabled at all, there should be no 
> observer reads.
> Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
> context but there are still observer reads. 
> The solution to this is to not advance the shareGlobalStateID in 
> PoolAlignmentContext when the namenode returns a values of zero in the 
> RpcResponseHeader. Zero indicates that stateIdContext is disabled and should 
> not be treated as a valid state ID value. Note, fixing this will require 
> adjusting the unit tests as well.
> A further optimization related to HDFS-17514 is that when sharedGlobalStateId 
> and poolLocalStateId have been reset, we also should not allow 
> poolLocalStateId to be advanced by clients until the sharedGlobalStateId has 
> been advanced. This will protect existing clients from using a stale ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17527) RBF: Routers should not allow observer reads when namenode stateId context is disabled

2024-05-14 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira updated HDFS-17527:
--
Description: 
HDFS-17514 addressed the case when state ID context is first enabled and then 
disabled. However, if state Id is never enabled at all, there should be no 
observer reads.

Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
context but there are still observer reads. 

The solution to this is to not advance the shareGlobalStateID in 
PoolAlignmentContext when the namenode returns a values of zero in the 
RpcResponseHeader. Zero indicates that stateIdContext is disabled and should 
not be treated as a valid state ID value. Note, fixing this will require 
adjusting the unit tests as well.

A further optimization related to HDFS-17514 is that when sharedGlobalStateId 
and poolLocalStateId have been reset, we also should not allow poolLocalStateId 
to be advanced by clients until the sharedGlobalStateId has been advanced. This 
will protect existing clients from using a stale ID.

  was:
HDFS-17514 addressed the case when state ID context is first enabled and then 
disabled. However, if state Id is never enabled at all, there should be no 
observer reads.

Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
context but there are still observer reads. 

The solution to this is to not advance the shareGlobalStateID in 
PoolAlignmentContext when the namenode returns a values of zero in the 
RpcResponseHeader. Zero indicates that stateIdContext is disabled and should 
not be treated as a valid state ID value. Note, fixing this will require 
adjusting the unit tests as well.


> RBF: Routers should not allow observer reads when namenode stateId context is 
> disabled
> --
>
> Key: HDFS-17527
> URL: https://issues.apache.org/jira/browse/HDFS-17527
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Simbarashe Dzinamarira
>Priority: Major
>
> HDFS-17514 addressed the case when state ID context is first enabled and then 
> disabled. However, if state Id is never enabled at all, there should be no 
> observer reads.
> Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
> context but there are still observer reads. 
> The solution to this is to not advance the shareGlobalStateID in 
> PoolAlignmentContext when the namenode returns a values of zero in the 
> RpcResponseHeader. Zero indicates that stateIdContext is disabled and should 
> not be treated as a valid state ID value. Note, fixing this will require 
> adjusting the unit tests as well.
> A further optimization related to HDFS-17514 is that when sharedGlobalStateId 
> and poolLocalStateId have been reset, we also should not allow 
> poolLocalStateId to be advanced by clients until the sharedGlobalStateId has 
> been advanced. This will protect existing clients from using a stale ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17527) RBF: Routers should not allow observer reads when namenode stateId context is disabled

2024-05-14 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira updated HDFS-17527:
--
Description: 
HDFS-17514 addressed the case when state ID context is first enabled and then 
disabled. However, if state Id is never enabled at all, there should be no 
observer reads.

Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
context but there are still observer reads. 

The solution to this is to not advance the shareGlobalStateID in 
PoolAlignmentContext when the namenode returns a values of zero in the 
RpcResponseHeader. Zero indicates that stateIdContext is disabled and should 
not be treated as a valid state ID value. Note, fixing this will require 
adjusting the unit tests as well.

  was:
HDFS-17514 address the case when state ID context is first enabled and then 
disabled. However, if state Id is never enabled at all, there should be on 
observer reads.

Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
context but there are still observer reads. 

The solution to this is to not advance the shareGlobalStateID in 
PoolAlignmentContext when the namenode returns a values of zero in the 
RpcHeader. Zero indicated that stateIdContext is disabled and should not be 
treated as a valid state ID value. Note, fixing this will require fixing the 
unit tests as well.


> RBF: Routers should not allow observer reads when namenode stateId context is 
> disabled
> --
>
> Key: HDFS-17527
> URL: https://issues.apache.org/jira/browse/HDFS-17527
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Simbarashe Dzinamarira
>Priority: Major
>
> HDFS-17514 addressed the case when state ID context is first enabled and then 
> disabled. However, if state Id is never enabled at all, there should be no 
> observer reads.
> Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
> context but there are still observer reads. 
> The solution to this is to not advance the shareGlobalStateID in 
> PoolAlignmentContext when the namenode returns a values of zero in the 
> RpcResponseHeader. Zero indicates that stateIdContext is disabled and should 
> not be treated as a valid state ID value. Note, fixing this will require 
> adjusting the unit tests as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17527) RBF: Routers should not allow observer reads when namenode stateId context is disabled

2024-05-14 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira reassigned HDFS-17527:
-

Assignee: (was: Simbarashe Dzinamarira)

> RBF: Routers should not allow observer reads when namenode stateId context is 
> disabled
> --
>
> Key: HDFS-17527
> URL: https://issues.apache.org/jira/browse/HDFS-17527
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Simbarashe Dzinamarira
>Priority: Major
>
> HDFS-17514 address the case when state ID context is first enabled and then 
> disabled. However, if state Id is never enabled at all, there should be on 
> observer reads.
> Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
> context but there are still observer reads. 
> The solution to this is to not advance the shareGlobalStateID in 
> PoolAlignmentContext when the namenode returns a values of zero in the 
> RpcHeader. Zero indicated that stateIdContext is disabled and should not be 
> treated as a valid state ID value. Note, fixing this will require fixing the 
> unit tests as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-17527) RBF: Routers should not allow observer reads when namenode stateId context is disabled

2024-05-14 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira updated HDFS-17527:
--
Description: 
HDFS-17514 address the case when state ID context is first enabled and then 
disabled. However, if state Id is never enabled at all, there should be on 
observer reads.

Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
context but there are still observer reads. 

The solution to this is to not advance the shareGlobalStateID in 
PoolAlignmentContext when the namenode returns a values of zero in the 
RpcHeader. Zero indicated that stateIdContext is disabled and should not be 
treated as a valid state ID value. Note, fixing this will require fixing the 
unit tests as well.

> RBF: Routers should not allow observer reads when namenode stateId context is 
> disabled
> --
>
> Key: HDFS-17527
> URL: https://issues.apache.org/jira/browse/HDFS-17527
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Simbarashe Dzinamarira
>Priority: Major
>
> HDFS-17514 address the case when state ID context is first enabled and then 
> disabled. However, if state Id is never enabled at all, there should be on 
> observer reads.
> Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
> context but there are still observer reads. 
> The solution to this is to not advance the shareGlobalStateID in 
> PoolAlignmentContext when the namenode returns a values of zero in the 
> RpcHeader. Zero indicated that stateIdContext is disabled and should not be 
> treated as a valid state ID value. Note, fixing this will require fixing the 
> unit tests as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Assigned] (HDFS-17527) RBF: Routers should not allow observer reads when namenode stateId context is disabled

2024-05-14 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira reassigned HDFS-17527:
-

Assignee: Simbarashe Dzinamarira

> RBF: Routers should not allow observer reads when namenode stateId context is 
> disabled
> --
>
> Key: HDFS-17527
> URL: https://issues.apache.org/jira/browse/HDFS-17527
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>
> HDFS-17514 address the case when state ID context is first enabled and then 
> disabled. However, if state Id is never enabled at all, there should be on 
> observer reads.
> Tests in TestNoNamenodesAvailableLongTime do not enable the namenode state Id 
> context but there are still observer reads. 
> The solution to this is to not advance the shareGlobalStateID in 
> PoolAlignmentContext when the namenode returns a values of zero in the 
> RpcHeader. Zero indicated that stateIdContext is disabled and should not be 
> treated as a valid state ID value. Note, fixing this will require fixing the 
> unit tests as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-17527) RBF: Routers should not allow observer reads when namenode stateId context is disabled

2024-05-14 Thread Simbarashe Dzinamarira (Jira)
Simbarashe Dzinamarira created HDFS-17527:
-

 Summary: RBF: Routers should not allow observer reads when 
namenode stateId context is disabled
 Key: HDFS-17527
 URL: https://issues.apache.org/jira/browse/HDFS-17527
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Simbarashe Dzinamarira






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-17514) RBF: Routers keep using cached stateID even when active NN returns unset header

2024-05-14 Thread Simbarashe Dzinamarira (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simbarashe Dzinamarira resolved HDFS-17514.
---
Resolution: Fixed

> RBF: Routers keep using cached stateID even when active NN returns unset 
> header
> ---
>
> Key: HDFS-17514
> URL: https://issues.apache.org/jira/browse/HDFS-17514
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Minor
>  Labels: pull-request-available
>
> When a namenode that had "dfs.namenode.state.context.enabled" set to true is 
> restarted with the configuration set to false, routers will keep using a 
> previously cached state ID.
> Without RBF
> * clients that fetched the old stateID could have stale reads even after 
> msyncing
> * new clients will go to the active.
> With RBF
> * client that fetched the old stateID could have stale reads like above.
> * New clients will also fetch the stale stateID and potentially have stale 
> reads
> New clients that are created after the restart should not fetch the stale 
> state ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17518) In the lease monitor, if a file is closed, we should sync the editslog

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846350#comment-17846350
 ] 

ASF GitHub Bot commented on HDFS-17518:
---

ThinkerLei commented on code in PR #6809:
URL: https://github.com/apache/hadoop/pull/6809#discussion_r1600246934


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java:
##
@@ -626,7 +626,8 @@ private synchronized boolean checkLeases(Collection 
leasesToCheck) {
 }
   }
   // If a lease recovery happened, we need to sync later.

Review Comment:
   @Hexiaoqiao @vinayakumarb Thank you very much for your comment. In the one 
hand,  we may indeed not need to invoke logSync() in time. The purpose of this 
modification is to ensure that `editlog` can be `sync` in a timely manner like 
other write operations,so as to prevent the loss of the `editlog` in some 
extreme cases. on the other hand,  @vinayakumarb I'm still a little confused by 
what you're saying. The current modification  
   ```
   boolean isClosed = !lastINode.isUnderConstruction();
   if (!needSync && (!completed || isClosed)) {
   needSync = true;
 } 
   ```
has ensured that leaseMonitor can invoke `logSync()` when the file gets 
closed and  `reassign lease`. File gets closed, `isClosed` will be true. Lease 
reassigned , `completed` will be false and the initial value of `needSync` is 
false.





> In the lease monitor, if a file is closed, we should sync the editslog
> --
>
> Key: HDFS-17518
> URL: https://issues.apache.org/jira/browse/HDFS-17518
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: lei w
>Assignee: lei w
>Priority: Minor
>  Labels: pull-request-available
>
> In the lease monitor, if a file is closed,  method checklease will return 
> true, and then the edits log will not be sync. In my opinion, we should sync 
> the edits log to avoid not synchronizing the state to the standby NameNode 
> for a long time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17514) RBF: Routers keep using cached stateID even when active NN returns unset header

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846343#comment-17846343
 ] 

ASF GitHub Bot commented on HDFS-17514:
---

simbadzina merged PR #6804:
URL: https://github.com/apache/hadoop/pull/6804




> RBF: Routers keep using cached stateID even when active NN returns unset 
> header
> ---
>
> Key: HDFS-17514
> URL: https://issues.apache.org/jira/browse/HDFS-17514
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Minor
>  Labels: pull-request-available
>
> When a namenode that had "dfs.namenode.state.context.enabled" set to true is 
> restarted with the configuration set to false, routers will keep using a 
> previously cached state ID.
> Without RBF
> * clients that fetched the old stateID could have stale reads even after 
> msyncing
> * new clients will go to the active.
> With RBF
> * client that fetched the old stateID could have stale reads like above.
> * New clients will also fetch the stale stateID and potentially have stale 
> reads
> New clients that are created after the restart should not fetch the stale 
> state ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17514) RBF: Routers keep using cached stateID even when active NN returns unset header

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846340#comment-17846340
 ] 

ASF GitHub Bot commented on HDFS-17514:
---

simbadzina commented on PR #6804:
URL: https://github.com/apache/hadoop/pull/6804#issuecomment-2110488809

   Failing tests in continuous-integration unrelated to my changes
   ```
Failed junit tests  
   |  
hadoop.hdfs.server.federation.router.security.token.TestSQLDelegationTokenSecretManagerImpl
 
   |  hadoop.hdfs.server.federation.store.driver.TestStateStoreMySQL 
   ```




> RBF: Routers keep using cached stateID even when active NN returns unset 
> header
> ---
>
> Key: HDFS-17514
> URL: https://issues.apache.org/jira/browse/HDFS-17514
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Minor
>  Labels: pull-request-available
>
> When a namenode that had "dfs.namenode.state.context.enabled" set to true is 
> restarted with the configuration set to false, routers will keep using a 
> previously cached state ID.
> Without RBF
> * clients that fetched the old stateID could have stale reads even after 
> msyncing
> * new clients will go to the active.
> With RBF
> * client that fetched the old stateID could have stale reads like above.
> * New clients will also fetch the stale stateID and potentially have stale 
> reads
> New clients that are created after the restart should not fetch the stale 
> state ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846316#comment-17846316
 ] 

ASF GitHub Bot commented on HDFS-17476:
---

KeeProMise commented on PR #6747:
URL: https://github.com/apache/hadoop/pull/6747#issuecomment-2110243175

   > Thanks involving me.
   > 
   > @KeeProMise I'm just doubt what scenarios could cause a negative 
`clientStateId`?
   
   @ZanderXu thanks for your review, under normal circumstances, this problem 
does not occur; because when I wrote a single test, I forgot to pass the 
transaction id, which caused this problem;
   I think the code logic here itself lacks consideration of long overflow (in 
our environment, if the transaction id is not passed, the default is 
long.minvalue), which may cause problems in future iterations.




> fix: False positive "Observer Node is too far behind" due to long overflow.
> ---
>
> Key: HDFS-17476
> URL: https://issues.apache.org/jira/browse/HDFS-17476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HDFS-17476.patch, image-2024-04-18-10-57-10-481.png
>
>
> In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
> small negative number, clientStateId-serverStateId may be greater than 
> (ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
>                   * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
>                   * ESTIMATED_SERVER_TIME_MULTIPLIER),
> resulting in false positives that Observer Node is too far behind.
> !image-2024-04-18-10-57-10-481.png|width=742,height=110!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17521) EC: Fix calculation errors caused by special index order

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846310#comment-17846310
 ] 

ASF GitHub Bot commented on HDFS-17521:
---

zhengchenyu commented on PR #6813:
URL: https://github.com/apache/hadoop/pull/6813#issuecomment-2110181645

   @zhangshuyan0 
   Thanks for your review! 
   I don't means that parity index is smaller. Parity index and data index is 
fixed number, we can't update it.
   The reproduce case: 
   When we call RawErasureDecoder::decode, and if the parameter `erasedIndexes` 
is in special order. The special order is that the parity index precedes the 
data index. For example, if erasedIndexes is [8,0], will reproduce this problem.
   And you can run unit tests directly and reproduce this easily.
   I printed erasedIndexes for all errors in the single test in 
[wrongindex.txt](https://github.com/apache/hadoop/files/15308714/wrongindex.txt),
 all meet this characteristic.
   
   
   




> EC: Fix calculation errors caused by special index order
> 
>
> Key: HDFS-17521
> URL: https://issues.apache.org/jira/browse/HDFS-17521
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Critical
>  Labels: pull-request-available
>
> I found that if the erasedIndexes distribution is such that the parity index 
> is in front of the data index, ec will produce wrong results when decoding.
> In fact, HDFS-15186 has described this problem, but does not fundamentally 
> solve it.
> The reason is that the code assumes that erasedIndexes is preceded by the 
> data index and followed by parity index. If there is a parity index placed in 
> front of the data index, a calculation error will occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17514) RBF: Routers keep using cached stateID even when active NN returns unset header

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846239#comment-17846239
 ] 

ASF GitHub Bot commented on HDFS-17514:
---

hadoop-yetus commented on PR #6804:
URL: https://github.com/apache/hadoop/pull/6804#issuecomment-2109700134

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 00s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 01s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m 00s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  95m 50s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   5m 42s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   4m 53s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   5m 33s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   5m 08s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 157m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   3m 02s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   2m 30s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   2m 30s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 13s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   2m 44s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   2m 17s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 163m 47s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 47s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 441m 52s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6804 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 3998dbde031f 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 14ab9f7e6a8d3d3680c496a44c1ec5635596770b |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6804/7/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6804/7/console
 |
   | versions | git=2.45.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> RBF: Routers keep using cached stateID even when active NN returns unset 
> header
> ---
>
> Key: HDFS-17514
> URL: https://issues.apache.org/jira/browse/HDFS-17514
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Minor
>  Labels: pull-request-available
>
> When a namenode that had "dfs.namenode.state.context.enabled" set to true is 
> restarted with the configuration set to false, routers will keep using a 
> previously cached state ID.
> Without RBF
> * clients that fetched the old stateID could have stale reads even after 
> msyncing
> * new clients will go to the active.
> With RBF
> * client that fetched the old stateID could have stale reads like above.
> * New clients will also fetch the stale stateID and potentially have stale 
> reads
> New clients that are created after the restart should not fetch the stale 
> state ID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17526) getMetadataInputStream should use getShareDeleteFileInputStream for windows

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846234#comment-17846234
 ] 

ASF GitHub Bot commented on HDFS-17526:
---

hadoop-yetus commented on PR #6826:
URL: https://github.com/apache/hadoop/pull/6826#issuecomment-2109685586

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m 00s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  spotbugs  |   0m 00s |  |  spotbugs executables are not 
available.  |
   | +0 :ok: |  codespell  |   0m 01s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m 01s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m 00s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m 00s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  91m 52s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   6m 52s |  |  trunk passed  |
   | +1 :green_heart: |  checkstyle  |   5m 03s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   7m 02s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   6m 05s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  | 155m 06s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   4m 43s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   3m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javac  |   3m 35s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m 00s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   2m 21s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   4m 15s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   3m 38s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  | 159m 24s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  asflicense  |   5m 31s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 434m 15s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | GITHUB PR | https://github.com/apache/hadoop/pull/6826 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | MINGW64_NT-10.0-17763 42a9d20631d9 3.4.10-87d57229.x86_64 
2024-02-14 20:17 UTC x86_64 Msys |
   | Build tool | maven |
   | Personality | /c/hadoop/dev-support/bin/hadoop.sh |
   | git revision | trunk / 5f042ded4b7bb9ce826cd9e8b65c08a16e79aea9 |
   | Default Java | Azul Systems, Inc.-1.8.0_332-b09 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6826/1/testReport/
 |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch-windows-10/job/PR-6826/1/console
 |
   | versions | git=2.45.0.windows.1 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> getMetadataInputStream should use getShareDeleteFileInputStream for windows
> ---
>
> Key: HDFS-17526
> URL: https://issues.apache.org/jira/browse/HDFS-17526
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode
>Affects Versions: 3.3.4
>Reporter: Danny Becker
>Priority: Major
>  Labels: pull-request-available
>
> In HDFS-10636, the getDataInputStream method uses the 
> getShareDeleteFileInputStream for windows, but the getMetaDataInputStream 
> does not use this. The following error can happen when a DataNode is trying 
> to update the genstamp on a block in Windows.
> DataNode Logs:
> {{Caused by: java.io.IOException: Failed to rename 
> G:\data\hdfs\data\current\BP-1\current\finalized\subdir5\subdir16\blk_1_1.meta
>  to 
> G:\data\hdfs\data\current\BP-1\current\finalized\subdir5\subdir16\blk_1_2.meta
>  due to failure in native rename. 32: The process cannot access the file 
> because it is being used by another process.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 

[jira] [Commented] (HDFS-17476) fix: False positive "Observer Node is too far behind" due to long overflow.

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846212#comment-17846212
 ] 

ASF GitHub Bot commented on HDFS-17476:
---

ZanderXu commented on PR #6747:
URL: https://github.com/apache/hadoop/pull/6747#issuecomment-2109505953

   Thanks involving me. 
   
   @KeeProMise I'm just doubt what scenarios could cause a negative 
`clientStateId`?
   




> fix: False positive "Observer Node is too far behind" due to long overflow.
> ---
>
> Key: HDFS-17476
> URL: https://issues.apache.org/jira/browse/HDFS-17476
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Jian Zhang
>Assignee: Jian Zhang
>Priority: Critical
>  Labels: pull-request-available
> Attachments: HDFS-17476.patch, image-2024-04-18-10-57-10-481.png
>
>
> In the code GlobalStateIdContext#receiveRequestState(), if clientStateId is a 
> small negative number, clientStateId-serverStateId may be greater than 
> (ESTIMATED_TRANSACTIONS_PER_SECOND due to overflow
>                   * TimeUnit.MILLISECONDS.toSeconds(clientWaitTime)
>                   * ESTIMATED_SERVER_TIME_MULTIPLIER),
> resulting in false positives that Observer Node is too far behind.
> !image-2024-04-18-10-57-10-481.png|width=742,height=110!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-15186) Erasure Coding: Decommission may generate the parity block's content with all 0 in some case

2024-05-14 Thread Chenyu Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-15186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846211#comment-17846211
 ] 

Chenyu Zheng commented on HDFS-15186:
-

Hi, all. I reproduce the problem of ec algorithm which is described in 
HDFS-17521. Would you mind taking a look at HDFS-17521?

> Erasure Coding: Decommission may generate the parity block's content with all 
> 0 in some case
> 
>
> Key: HDFS-15186
> URL: https://issues.apache.org/jira/browse/HDFS-15186
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: datanode, erasure-coding
>Affects Versions: 3.0.3, 3.2.1, 3.1.3
>Reporter: Yao Guangdong
>Assignee: Yao Guangdong
>Priority: Critical
> Fix For: 3.3.0
>
> Attachments: HDFS-15186.001.patch, HDFS-15186.002.patch, 
> HDFS-15186.003.patch, HDFS-15186.004.patch, HDFS-15186.005.patch
>
>
> # I can find some parity block's content with all 0 when i decommission some 
> DataNode(more than 1) from a cluster. And the probability is very big(parts 
> per thousand).This is a big problem.You can think that if we read data from 
> the zero parity block or use the zero parity block to recover a block which 
> can make us use the error data even we don't know it.
> There is some case in the below:
> B: Busy DataNode, 
> D:Decommissioning DataNode,
> Others is normal.
> 1.Group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)].
> 2.Group indices is [0(B,D), 1, 2, 3, 4, 5, 6(B,D), 7, 8(D)].
> 
> In the first case when the block group indices is [0, 1, 2, 3, 4, 5, 6(B,D), 
> 7, 8(D)], the DN may received reconstruct block command and the 
> liveIndices=[0, 1, 2, 3, 4, 5, 7, 8] and the targets's(the field which  in 
> the class StripedReconstructionInfo) length is 2. 
> The targets's length is 2 which mean that the DataNode need recover 2 
> internal block in current code.But from the liveIndices we only can find 1 
> missing block, so the method StripedWriter#initTargetIndices will use 0 as 
> the default recover block and don't care the indices 0 is in the sources 
> indices or not.
> When they use sources indices [0, 1, 2, 3, 4, 5] to recover indices [6, 0] 
> use the ec algorithm.We can find that the indices [0] is in the both the 
> sources indices and the targets indices in this case. The returned target 
> buffer in the indices [6] is always 0 from the ec  algorithm.So I think this 
> is the ec algorithm's problem. Because it should more fault tolerance.I try 
> to fixed it .But it is too hard. Because the case is too more. The second is 
> another case in the example above(use sources indices [1, 2, 3, 4, 5, 7] to 
> recover indices [0, 6, 0]). So I changed my mind.Invoke the ec  algorithm 
> with a correct parameters. Which mean that remove the duplicate target 
> indices 0 in this case.Finally, I fixed it in this way.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-17521) EC: Fix calculation errors caused by special index order

2024-05-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-17521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846195#comment-17846195
 ] 

ASF GitHub Bot commented on HDFS-17521:
---

slfan1989 commented on PR #6813:
URL: https://github.com/apache/hadoop/pull/6813#issuecomment-2109448492

   @zhangshuyan0 @haiyang1987 Could you help review this PR? I'm not very 
familiar with EC, but  I've noticed that  you have submitted quite a few 
improvements related to EC. Thank you very much!




> EC: Fix calculation errors caused by special index order
> 
>
> Key: HDFS-17521
> URL: https://issues.apache.org/jira/browse/HDFS-17521
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Critical
>  Labels: pull-request-available
>
> I found that if the erasedIndexes distribution is such that the parity index 
> is in front of the data index, ec will produce wrong results when decoding.
> In fact, HDFS-15186 has described this problem, but does not fundamentally 
> solve it.
> The reason is that the code assumes that erasedIndexes is preceded by the 
> data index and followed by parity index. If there is a parity index placed in 
> front of the data index, a calculation error will occur.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org