[jira] [Created] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.
ZhangHB created HDFS-16880: -- Summary: modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info. Key: HDFS-16880 URL: https://issues.apache.org/jira/browse/HDFS-16880 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Affects Versions: 3.3.4 Reporter: ZhangHB We found lots of INFO level log like below: {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: / is closed by DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480 {quote} It lost the real path of completeFile. Actually this is caused by : *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String, org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)* In this method, it instantiates a RemoteLocationContext object: *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");* and then execute: *Object[] params = method.getParams(loc);* The problem is right here, becasuse we always use new RemoteParam(), so, context.getDest() always return "/"; That's why we saw lots of incorrect logs. After diving into invokeSingleXXX source code, I found the following RPCs classified as need actual src and not need actual src. *need src path RPC:* addBlock、abandonBlock、getAdditionalDatanode、complete *not need src path RPC:* updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked by: getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies) After changes, the src can be pass to NN correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks
[ https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-16879: -- Description: For the block of the ec file run hdfs fsck -blockId xxx can add shows number of redundant internal block replicas. for example: the current blockgroup has 10 live replicas, and it will show there are 9 live replicas. actually, there is a live replica that should be in the redundant state, can add shows "No. of redundant Replica: 1" {code:java} hdfs fsck -blockId blk_-xxx Block Id: blk_-xxx Block belongs to: /ec/file1 No. of Expected Replica: 9 No. of live Replica: 9 No. of excess Replica: 0 No. of stale Replica: 0 No. of decommissioned Replica: 0 No. of decommissioning Replica: 0 No. of corrupted Replica: 0 Block replica on datanode/rack: ip-xxx1 is HEALTHY Block replica on datanode/rack: ip-xxx2 is HEALTHY Block replica on datanode/rack: ip-xxx3 is HEALTHY Block replica on datanode/rack: ip-xxx4 is HEALTHY Block replica on datanode/rack: ip-xxx5 is HEALTHY Block replica on datanode/rack: ip-xxx6 is HEALTHY Block replica on datanode/rack: ip-xxx7 is HEALTHY Block replica on datanode/rack: ip-xxx8 is HEALTHY Block replica on datanode/rack: ip-xxx9 is HEALTHY Block replica on datanode/rack: ip-xxx10 is HEALTHY {code} was: For the block of the ec file run hdfs fsck -blockId xxx can add shows number of redundant internal block replicas for example: the current blockgroup has 10 live replicas, and it will show there are 9 live replicas. actually, there is a live replica that should be in the redundant state, can add shows "No. of redundant Replica: 1" {code:java} hdfs fsck -blockId blk_-xxx Block Id: blk_-xxx Block belongs to: /ec/file1 No. of Expected Replica: 9 No. of live Replica: 9 No. of excess Replica: 0 No. of stale Replica: 0 No. of decommissioned Replica: 0 No. of decommissioning Replica: 0 No. of corrupted Replica: 0 Block replica on datanode/rack: ip-xxx1 is HEALTHY Block replica on datanode/rack: ip-xxx2 is HEALTHY Block replica on datanode/rack: ip-xxx3 is HEALTHY Block replica on datanode/rack: ip-xxx4 is HEALTHY Block replica on datanode/rack: ip-xxx5 is HEALTHY Block replica on datanode/rack: ip-xxx6 is HEALTHY Block replica on datanode/rack: ip-xxx7 is HEALTHY Block replica on datanode/rack: ip-xxx8 is HEALTHY Block replica on datanode/rack: ip-xxx9 is HEALTHY Block replica on datanode/rack: ip-xxx10 is HEALTHY {code} > EC : Fsck -blockId shows number of redundant internal block replicas for EC > Blocks > -- > > Key: HDFS-16879 > URL: https://issues.apache.org/jira/browse/HDFS-16879 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > > For the block of the ec file run hdfs fsck -blockId xxx can add shows number > of redundant internal block replicas. > for example: the current blockgroup has 10 live replicas, and it will show > there are 9 live replicas. > actually, there is a live replica that should be in the redundant state, can > add shows "No. of redundant Replica: 1" > {code:java} > hdfs fsck -blockId blk_-xxx > Block Id: blk_-xxx > Block belongs to: /ec/file1 > No. of Expected Replica: 9 > No. of live Replica: 9 > No. of excess Replica: 0 > No. of stale Replica: 0 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > Block replica on datanode/rack: ip-xxx1 is HEALTHY > Block replica on datanode/rack: ip-xxx2 is HEALTHY > Block replica on datanode/rack: ip-xxx3 is HEALTHY > Block replica on datanode/rack: ip-xxx4 is HEALTHY > Block replica on datanode/rack: ip-xxx5 is HEALTHY > Block replica on datanode/rack: ip-xxx6 is HEALTHY > Block replica on datanode/rack: ip-xxx7 is HEALTHY > Block replica on datanode/rack: ip-xxx8 is HEALTHY > Block replica on datanode/rack: ip-xxx9 is HEALTHY > Block replica on datanode/rack: ip-xxx10 is HEALTHY > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks
[ https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-16879: -- Description: For the block of the ec file run hdfs fsck -blockId xxx can add shows number of redundant internal block replicas for example: the current blockgroup has 10 live replicas, and it will show there are 9 live replicas. actually, there is a live replica that should be in the redundant state, can add shows "No. of redundant Replica: 1" {code:java} hdfs fsck -blockId blk_-xxx Block Id: blk_-xxx Block belongs to: /ec/file1 No. of Expected Replica: 9 No. of live Replica: 9 No. of excess Replica: 0 No. of stale Replica: 0 No. of decommissioned Replica: 0 No. of decommissioning Replica: 0 No. of corrupted Replica: 0 Block replica on datanode/rack: ip-xxx1 is HEALTHY Block replica on datanode/rack: ip-xxx2 is HEALTHY Block replica on datanode/rack: ip-xxx3 is HEALTHY Block replica on datanode/rack: ip-xxx4 is HEALTHY Block replica on datanode/rack: ip-xxx5 is HEALTHY Block replica on datanode/rack: ip-xxx6 is HEALTHY Block replica on datanode/rack: ip-xxx7 is HEALTHY Block replica on datanode/rack: ip-xxx8 is HEALTHY Block replica on datanode/rack: ip-xxx9 is HEALTHY Block replica on datanode/rack: ip-xxx10 is HEALTHY {code} was: For the block of the ec file run hdfs fsck -blockId xxx can add shows number of redundant internal block replicas for example: the current blockgroup has 10 live replicas, and it will show that there are 9 live replicas. actually, there is a live replica that should be in the redundant state, can add shows number of redundant internal block replicas {code:java} hdfs fsck -blockId blk_-xxx Block Id: blk_-xxx Block belongs to: /ec/file1 No. of Expected Replica: 9 No. of live Replica: 9 No. of excess Replica: 0 No. of stale Replica: 0 No. of decommissioned Replica: 0 No. of decommissioning Replica: 0 No. of corrupted Replica: 0 Block replica on datanode/rack: ip-xxx1 is HEALTHY Block replica on datanode/rack: ip-xxx2 is HEALTHY Block replica on datanode/rack: ip-xxx3 is HEALTHY Block replica on datanode/rack: ip-xxx4 is HEALTHY Block replica on datanode/rack: ip-xxx5 is HEALTHY Block replica on datanode/rack: ip-xxx6 is HEALTHY Block replica on datanode/rack: ip-xxx7 is HEALTHY Block replica on datanode/rack: ip-xxx8 is HEALTHY Block replica on datanode/rack: ip-xxx9 is HEALTHY Block replica on datanode/rack: ip-xxx10 is HEALTHY {code} > EC : Fsck -blockId shows number of redundant internal block replicas for EC > Blocks > -- > > Key: HDFS-16879 > URL: https://issues.apache.org/jira/browse/HDFS-16879 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > > For the block of the ec file run hdfs fsck -blockId xxx can add shows number > of redundant internal block replicas > for example: the current blockgroup has 10 live replicas, and it will show > there are 9 live replicas. > actually, there is a live replica that should be in the redundant state, can > add shows "No. of redundant Replica: 1" > {code:java} > hdfs fsck -blockId blk_-xxx > Block Id: blk_-xxx > Block belongs to: /ec/file1 > No. of Expected Replica: 9 > No. of live Replica: 9 > No. of excess Replica: 0 > No. of stale Replica: 0 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > Block replica on datanode/rack: ip-xxx1 is HEALTHY > Block replica on datanode/rack: ip-xxx2 is HEALTHY > Block replica on datanode/rack: ip-xxx3 is HEALTHY > Block replica on datanode/rack: ip-xxx4 is HEALTHY > Block replica on datanode/rack: ip-xxx5 is HEALTHY > Block replica on datanode/rack: ip-xxx6 is HEALTHY > Block replica on datanode/rack: ip-xxx7 is HEALTHY > Block replica on datanode/rack: ip-xxx8 is HEALTHY > Block replica on datanode/rack: ip-xxx9 is HEALTHY > Block replica on datanode/rack: ip-xxx10 is HEALTHY > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks
[ https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-16879: -- Description: For the block of the ec file run hdfs fsck -blockId xxx can add shows number of redundant internal block replicas for example: the current blockgroup has 10 live replicas, and it will show that there are 9 live replicas. actually, there is a live replica that should be in the redundant state, can add shows number of redundant internal block replicas {code:java} hdfs fsck -blockId blk_-xxx Block Id: blk_-xxx Block belongs to: /ec/file1 No. of Expected Replica: 9 No. of live Replica: 9 No. of excess Replica: 0 No. of stale Replica: 0 No. of decommissioned Replica: 0 No. of decommissioning Replica: 0 No. of corrupted Replica: 0 Block replica on datanode/rack: ip-xxx1 is HEALTHY Block replica on datanode/rack: ip-xxx2 is HEALTHY Block replica on datanode/rack: ip-xxx3 is HEALTHY Block replica on datanode/rack: ip-xxx4 is HEALTHY Block replica on datanode/rack: ip-xxx5 is HEALTHY Block replica on datanode/rack: ip-xxx6 is HEALTHY Block replica on datanode/rack: ip-xxx7 is HEALTHY Block replica on datanode/rack: ip-xxx8 is HEALTHY Block replica on datanode/rack: ip-xxx9 is HEALTHY Block replica on datanode/rack: ip-xxx10 is HEALTHY {code} > EC : Fsck -blockId shows number of redundant internal block replicas for EC > Blocks > -- > > Key: HDFS-16879 > URL: https://issues.apache.org/jira/browse/HDFS-16879 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > > For the block of the ec file run hdfs fsck -blockId xxx can add shows number > of redundant internal block replicas > for example: the current blockgroup has 10 live replicas, and it will show > that there are 9 live replicas. > actually, there is a live replica that should be in the redundant state, can > add shows number of redundant internal block replicas > {code:java} > hdfs fsck -blockId blk_-xxx > Block Id: blk_-xxx > Block belongs to: /ec/file1 > No. of Expected Replica: 9 > No. of live Replica: 9 > No. of excess Replica: 0 > No. of stale Replica: 0 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > Block replica on datanode/rack: ip-xxx1 is HEALTHY > Block replica on datanode/rack: ip-xxx2 is HEALTHY > Block replica on datanode/rack: ip-xxx3 is HEALTHY > Block replica on datanode/rack: ip-xxx4 is HEALTHY > Block replica on datanode/rack: ip-xxx5 is HEALTHY > Block replica on datanode/rack: ip-xxx6 is HEALTHY > Block replica on datanode/rack: ip-xxx7 is HEALTHY > Block replica on datanode/rack: ip-xxx8 is HEALTHY > Block replica on datanode/rack: ip-xxx9 is HEALTHY > Block replica on datanode/rack: ip-xxx10 is HEALTHY > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks
[ https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu updated HDFS-16879: -- Summary: EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks (was: EC : Fsck -blockId shows number of redundant internal block replicas for EC files) > EC : Fsck -blockId shows number of redundant internal block replicas for EC > Blocks > -- > > Key: HDFS-16879 > URL: https://issues.apache.org/jira/browse/HDFS-16879 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC files
Haiyang Hu created HDFS-16879: - Summary: EC : Fsck -blockId shows number of redundant internal block replicas for EC files Key: HDFS-16879 URL: https://issues.apache.org/jira/browse/HDFS-16879 Project: Hadoop HDFS Issue Type: Improvement Reporter: Haiyang Hu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC files
[ https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Haiyang Hu reassigned HDFS-16879: - Assignee: Haiyang Hu > EC : Fsck -blockId shows number of redundant internal block replicas for EC > files > - > > Key: HDFS-16879 > URL: https://issues.apache.org/jira/browse/HDFS-16879 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16878) TestLeaseRecovery2 timeouts
[ https://issues.apache.org/jira/browse/HDFS-16878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akira Ajisaka resolved HDFS-16878. -- Resolution: Duplicate Dup of HDFS-16853. Closing. > TestLeaseRecovery2 timeouts > --- > > Key: HDFS-16878 > URL: https://issues.apache.org/jira/browse/HDFS-16878 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Reporter: Akira Ajisaka >Priority: Major > > The following tests in TestLeaseRecover2 timeouts > * testHardLeaseRecoveryAfterNameNodeRestart > * testHardLeaseRecoveryAfterNameNodeRestart2 > * testHardLeaseRecoveryWithRenameAfterNameNodeRestart > {noformat} > [ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: > 139.044 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2 > [ERROR] > testHardLeaseRecoveryAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2) > Time elapsed: 30.47 s <<< ERROR! > org.junit.runners.model.TestTimedOutException: test timed out after 3 > milliseconds > at java.lang.Thread.sleep(Native Method) > at > org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:2831) > at > org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:2880) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:594) > at > org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart(TestLeaseRecovery2.java:498) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at java.lang.Thread.run(Thread.java:750) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16878) TestLeaseRecovery2 timeouts
Akira Ajisaka created HDFS-16878: Summary: TestLeaseRecovery2 timeouts Key: HDFS-16878 URL: https://issues.apache.org/jira/browse/HDFS-16878 Project: Hadoop HDFS Issue Type: Sub-task Components: test Reporter: Akira Ajisaka The following tests in TestLeaseRecover2 timeouts * testHardLeaseRecoveryAfterNameNodeRestart * testHardLeaseRecoveryAfterNameNodeRestart2 * testHardLeaseRecoveryWithRenameAfterNameNodeRestart {noformat} [ERROR] Tests run: 8, Failures: 0, Errors: 3, Skipped: 0, Time elapsed: 139.044 s <<< FAILURE! - in org.apache.hadoop.hdfs.TestLeaseRecovery2 [ERROR] testHardLeaseRecoveryAfterNameNodeRestart(org.apache.hadoop.hdfs.TestLeaseRecovery2) Time elapsed: 30.47 s <<< ERROR! org.junit.runners.model.TestTimedOutException: test timed out after 3 milliseconds at java.lang.Thread.sleep(Native Method) at org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:2831) at org.apache.hadoop.hdfs.MiniDFSCluster.waitActive(MiniDFSCluster.java:2880) at org.apache.hadoop.hdfs.TestLeaseRecovery2.hardLeaseRecoveryRestartHelper(TestLeaseRecovery2.java:594) at org.apache.hadoop.hdfs.TestLeaseRecovery2.testHardLeaseRecoveryAfterNameNodeRestart(TestLeaseRecovery2.java:498) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.lang.Thread.run(Thread.java:750) {noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org