[jira] [Commented] (HDFS-16987) NameNode should remove all invalid corrupted blocks when starting active service
[ https://issues.apache.org/jira/browse/HDFS-16987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719623#comment-17719623 ] ASF GitHub Bot commented on HDFS-16987: --- Hexiaoqiao commented on PR #5583: URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1535684657 @ZanderXu thanks for your detailed comments. IIUC, the replica's GS is monotonically increasing at DataNode side, right? If that I think we could improve it rely on the basic rule at NameNode side, a. ignore one replica with older GS at the same DataNode. b. mark corrupt when one replica with same GS NameNode maintained but older GS compare to other replicas. If this is true, improve `markBlockAsCorrupt` could fix this issue. Please correct me if I missing some information. Thanks. > NameNode should remove all invalid corrupted blocks when starting active > service > > > Key: HDFS-16987 > URL: https://issues.apache.org/jira/browse/HDFS-16987 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > > In our prod environment, we encountered an incident where HA failover caused > some new corrupted blocks, causing some jobs to fail. > > Traced down and found a bug in the processing of all pending DN messages when > starting active services. > The steps to reproduce are as follows: > # Suppose NN1 is Active and NN2 is Standby, Active works well and Standby is > unstable > # Timing 1, client create a file, write some data and close it. > # Timing 2, client append this file, write some data and close it. > # Timing 3, Standby replayed the second closing edits of this file > # Timing 4, Standby processes the blockReceivedAndDeleted of the first > create operation > # Timing 5, Standby processed the blockReceivedAndDeleted of the second > append operation > # Timing 6, Admin switched the active namenode from NN1 to NN2 > # Timing 7, client failed to append some data to this file. > {code:java} > org.apache.hadoop.ipc.RemoteException(java.io.IOException): append: > lastBlock=blk_1073741825_1002 of src=/testCorruptedBlockAfterHAFailover is > not sufficiently replicated yet. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAppendOp.appendFile(FSDirAppendOp.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2992) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:858) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:527) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1221) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1144) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1953) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3170) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16999) Fix wrong use of processFirstBlockReport()
[ https://issues.apache.org/jira/browse/HDFS-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719619#comment-17719619 ] ASF GitHub Bot commented on HDFS-16999: --- Hexiaoqiao commented on PR #5622: URL: https://github.com/apache/hadoop/pull/5622#issuecomment-1535670070 @zhangshuyan0 Thanks for your proposal and try to fix this issue. Just glance the PR, it propose to switch `processFirstBlockReport` to `processReport` when restart DataNode only, right? I am concerned the performance if we do that. a. Is it possible to improve `processFirstBlockReport` to solve this issue? b. Will it affect the process logic or performance when add one new DataNode to cluster? Thanks again. > Fix wrong use of processFirstBlockReport() > -- > > Key: HDFS-16999 > URL: https://issues.apache.org/jira/browse/HDFS-16999 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > `processFirstBlockReport()` is used to process first block report from > datanode. It does not calculating `toRemove` list because it believes that > there is no metadata about the datanode in the namenode. However, If a > datanode is re registered after restarting, its `blockReportCount` will be > updated to 0. That is to say, the first block report after a datanode > restarts will be processed by `processFirstBlockReport()`. This is > unreasonable because the metadata of the datanode already exists in namenode > at this time, and if redundant replica metadata is not removed in time, the > blocks with insufficient replicas cannot be reconstruct in time, which > increases the risk of missing block. In summary, `processFirstBlockReport()` > should only be used when the namenode restarts, not when the datanode > restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16987) NameNode should remove all invalid corrupted blocks when starting active service
[ https://issues.apache.org/jira/browse/HDFS-16987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719616#comment-17719616 ] ASF GitHub Bot commented on HDFS-16987: --- ZanderXu commented on PR #5583: URL: https://github.com/apache/hadoop/pull/5583#issuecomment-1535657249 @Hexiaoqiao @ayushtkn Master, after deep thinking, maybe we can only fix this problem when processAllPendingDNMessages, because namenode doesn't know whether this report is consistent with the actual replica storage information of the DataNode. **Case1: This report with small GS is postponed report, which is different from the actual replica of the datanode.** For example: - The actual replica of DN is: blk_1024_1002 - The postponed report is: blk_1024_1001 For this case, namenode can ignore this postponed report and doesn't mark it as a corrupted replica. **Case2: This report with small GS is the newest report, which is same with the actual replica of the datanode.** For example: - The actual replica of DN is: blk_1024_1001 - The report is: blk_1024_1001 - The storages of this block in namenode already contains this DN For this case, namenode shouldn't ignore this report, and it should mark this replica as a corrupted replica. Manually modifying block storage files on DataNode may cause this problem. At present, namenode can only consider that each report is the newest report, and then modify the status of the block in the memory of namenode, because datanode reports the state to NN through block report or blockReceiveAndDelete. If we modify the logic of `markBlockAsCorrupt`, namenode will can not mark the replica as a corrupted replica for case2. If we modify the logic of `processAllPendingDNMessages`, the postponed message will be temporarily ignored for case 2, and active namenode will mark it as a corrupted replica in the next block report of corressponding DN. > NameNode should remove all invalid corrupted blocks when starting active > service > > > Key: HDFS-16987 > URL: https://issues.apache.org/jira/browse/HDFS-16987 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > > In our prod environment, we encountered an incident where HA failover caused > some new corrupted blocks, causing some jobs to fail. > > Traced down and found a bug in the processing of all pending DN messages when > starting active services. > The steps to reproduce are as follows: > # Suppose NN1 is Active and NN2 is Standby, Active works well and Standby is > unstable > # Timing 1, client create a file, write some data and close it. > # Timing 2, client append this file, write some data and close it. > # Timing 3, Standby replayed the second closing edits of this file > # Timing 4, Standby processes the blockReceivedAndDeleted of the first > create operation > # Timing 5, Standby processed the blockReceivedAndDeleted of the second > append operation > # Timing 6, Admin switched the active namenode from NN1 to NN2 > # Timing 7, client failed to append some data to this file. > {code:java} > org.apache.hadoop.ipc.RemoteException(java.io.IOException): append: > lastBlock=blk_1073741825_1002 of src=/testCorruptedBlockAfterHAFailover is > not sufficiently replicated yet. > at > org.apache.hadoop.hdfs.server.namenode.FSDirAppendOp.appendFile(FSDirAppendOp.java:138) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2992) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:858) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:527) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589) > at > org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1221) > at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1144) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at >
[jira] [Commented] (HDFS-16991) Fix testMkdirsRaceWithObserverRead
[ https://issues.apache.org/jira/browse/HDFS-16991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719614#comment-17719614 ] ASF GitHub Bot commented on HDFS-16991: --- hadoop-yetus commented on PR #5591: URL: https://github.com/apache/hadoop/pull/5591#issuecomment-1535655249 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 35m 47s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 33m 52s | | trunk passed | | +1 :green_heart: | compile | 1m 11s | | trunk passed | | +1 :green_heart: | checkstyle | 1m 10s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 18s | | trunk passed | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed | | +1 :green_heart: | spotbugs | 3m 21s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 54s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 5s | | the patch passed | | +1 :green_heart: | compile | 1m 1s | | the patch passed | | +1 :green_heart: | javac | 1m 1s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 53s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 7s | | the patch passed | | +1 :green_heart: | javadoc | 1m 25s | | the patch passed | | +1 :green_heart: | spotbugs | 3m 4s | | the patch passed | | +1 :green_heart: | shadedclient | 21m 28s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 204m 39s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5591/4/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 52s | | The patch does not generate ASF License warnings. | | | | 334m 25s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.fsdataset.impl.TestFsDatasetImpl | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5591/4/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5591 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux fcb3894283a7 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 44945bf9dc678a43026dac92d350900f5a677136 | | Default Java | Red Hat, Inc.-1.8.0_362-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5591/4/testReport/ | | Max. process+thread count | 4181 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5591/4/console | | versions | git=2.9.5 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Fix testMkdirsRaceWithObserverRead > -- > > Key: HDFS-16991 > URL: https://issues.apache.org/jira/browse/HDFS-16991 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs >Affects Versions: 3.3.4 >Reporter: fanluo >Priority: Minor > Labels: pull-request-available > > The test case testMkdirsRaceWithObserverRead which in TestObserverNode > sometimes failed like this: > {code:java} > java.lang.AssertionError: Client #1 lastSeenStateId=-9223372036854775808 > activStateId=5 > null at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.assertTrue(Assert.java:42) > at >
[jira] [Commented] (HDFS-16999) Fix wrong use of processFirstBlockReport()
[ https://issues.apache.org/jira/browse/HDFS-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719610#comment-17719610 ] ASF GitHub Bot commented on HDFS-16999: --- zhangshuyan0 opened a new pull request, #5622: URL: https://github.com/apache/hadoop/pull/5622 ### Description of PR `processFirstBlockReport()` is used to process first block report from datanode. It does not calculating `toRemove` list because it believes that there is no metadata about the datanode in the namenode. However, If a datanode is re registered after restarting, its `blockReportCount` will be updated to 0. https://github.com/apache/hadoop/blob/c7699d3dcd4f8feaf2c5ae5943b8a4cec738e95d/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeDescriptor.java#L1007-L1012 That is to say, the first block report after a datanode restarts will be processed by `processFirstBlockReport()`. https://github.com/apache/hadoop/blob/c7699d3dcd4f8feaf2c5ae5943b8a4cec738e95d/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/BlockManager.java#L2916-L2925 This is unreasonable because the metadata of the datanode already exists in namenode at this time, and if redundant replica metadata is not removed in time, the blocks with insufficient replicas cannot be reconstruct in time, which increases the risk of missing block. In summary, `processFirstBlockReport()` should only be used when the namenode restarts, not when the datanode restarts. ### How was this patch tested? Add a new unit test. > Fix wrong use of processFirstBlockReport() > -- > > Key: HDFS-16999 > URL: https://issues.apache.org/jira/browse/HDFS-16999 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > `processFirstBlockReport()` is used to process first block report from > datanode. It does not calculating `toRemove` list because it believes that > there is no metadata about the datanode in the namenode. However, If a > datanode is re registered after restarting, its `blockReportCount` will be > updated to 0. That is to say, the first block report after a datanode > restarts will be processed by `processFirstBlockReport()`. This is > unreasonable because the metadata of the datanode already exists in namenode > at this time, and if redundant replica metadata is not removed in time, the > blocks with insufficient replicas cannot be reconstruct in time, which > increases the risk of missing block. In summary, `processFirstBlockReport()` > should only be used when the namenode restarts, not when the datanode > restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16999) Fix wrong use of processFirstBlockReport()
[ https://issues.apache.org/jira/browse/HDFS-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16999: -- Labels: pull-request-available (was: ) > Fix wrong use of processFirstBlockReport() > -- > > Key: HDFS-16999 > URL: https://issues.apache.org/jira/browse/HDFS-16999 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > Labels: pull-request-available > > `processFirstBlockReport()` is used to process first block report from > datanode. It does not calculating `toRemove` list because it believes that > there is no metadata about the datanode in the namenode. However, If a > datanode is re registered after restarting, its `blockReportCount` will be > updated to 0. That is to say, the first block report after a datanode > restarts will be processed by `processFirstBlockReport()`. This is > unreasonable because the metadata of the datanode already exists in namenode > at this time, and if redundant replica metadata is not removed in time, the > blocks with insufficient replicas cannot be reconstruct in time, which > increases the risk of missing block. In summary, `processFirstBlockReport()` > should only be used when the namenode restarts, not when the datanode > restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Assigned] (HDFS-16999) Fix wrong use of processFirstBlockReport()
[ https://issues.apache.org/jira/browse/HDFS-16999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuyan Zhang reassigned HDFS-16999: --- Assignee: Shuyan Zhang > Fix wrong use of processFirstBlockReport() > -- > > Key: HDFS-16999 > URL: https://issues.apache.org/jira/browse/HDFS-16999 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: Shuyan Zhang >Assignee: Shuyan Zhang >Priority: Major > > `processFirstBlockReport()` is used to process first block report from > datanode. It does not calculating `toRemove` list because it believes that > there is no metadata about the datanode in the namenode. However, If a > datanode is re registered after restarting, its `blockReportCount` will be > updated to 0. That is to say, the first block report after a datanode > restarts will be processed by `processFirstBlockReport()`. This is > unreasonable because the metadata of the datanode already exists in namenode > at this time, and if redundant replica metadata is not removed in time, the > blocks with insufficient replicas cannot be reconstruct in time, which > increases the risk of missing block. In summary, `processFirstBlockReport()` > should only be used when the namenode restarts, not when the datanode > restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16999) Fix wrong use of processFirstBlockReport()
Shuyan Zhang created HDFS-16999: --- Summary: Fix wrong use of processFirstBlockReport() Key: HDFS-16999 URL: https://issues.apache.org/jira/browse/HDFS-16999 Project: Hadoop HDFS Issue Type: Bug Reporter: Shuyan Zhang `processFirstBlockReport()` is used to process first block report from datanode. It does not calculating `toRemove` list because it believes that there is no metadata about the datanode in the namenode. However, If a datanode is re registered after restarting, its `blockReportCount` will be updated to 0. That is to say, the first block report after a datanode restarts will be processed by `processFirstBlockReport()`. This is unreasonable because the metadata of the datanode already exists in namenode at this time, and if redundant replica metadata is not removed in time, the blocks with insufficient replicas cannot be reconstruct in time, which increases the risk of missing block. In summary, `processFirstBlockReport()` should only be used when the namenode restarts, not when the datanode restarts. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16865) RBF: The source path is always / after RBF proxied the complete, addBlock and getAdditionalDatanode RPC.
[ https://issues.apache.org/jira/browse/HDFS-16865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719461#comment-17719461 ] ASF GitHub Bot commented on HDFS-16865: --- ayushtkn commented on PR #5200: URL: https://github.com/apache/hadoop/pull/5200#issuecomment-1535185336 @goiri does this makes sense? Will be holding this for you > RBF: The source path is always / after RBF proxied the complete, addBlock and > getAdditionalDatanode RPC. > > > Key: HDFS-16865 > URL: https://issues.apache.org/jira/browse/HDFS-16865 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > The source path is always / after RBF proxied the complete, addBlock and > getAdditionalDatanode RPC. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16998) RBF: Add ops metrics for getSlowDatanodeReport in RouterClientActivity
[ https://issues.apache.org/jira/browse/HDFS-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri resolved HDFS-16998. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > RBF: Add ops metrics for getSlowDatanodeReport in RouterClientActivity > -- > > Key: HDFS-16998 > URL: https://issues.apache.org/jira/browse/HDFS-16998 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16998) RBF: Add ops metrics for getSlowDatanodeReport in RouterClientActivity
[ https://issues.apache.org/jira/browse/HDFS-16998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719419#comment-17719419 ] ASF GitHub Bot commented on HDFS-16998: --- goiri merged PR #5615: URL: https://github.com/apache/hadoop/pull/5615 > RBF: Add ops metrics for getSlowDatanodeReport in RouterClientActivity > -- > > Key: HDFS-16998 > URL: https://issues.apache.org/jira/browse/HDFS-16998 > Project: Hadoop HDFS > Issue Type: Task >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16979) RBF: Add dfsrouter port in hdfsauditlog
[ https://issues.apache.org/jira/browse/HDFS-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719381#comment-17719381 ] ASF GitHub Bot commented on HDFS-16979: --- hadoop-yetus commented on PR #5552: URL: https://github.com/apache/hadoop/pull/5552#issuecomment-1534947516 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 47s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 16m 0s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 23m 2s | | trunk passed | | +1 :green_heart: | compile | 17m 26s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 15m 34s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | checkstyle | 4m 1s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 43s | | trunk passed | | +1 :green_heart: | javadoc | 3m 2s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 3m 26s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 7m 18s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 44s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 22s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 31s | | the patch passed | | +1 :green_heart: | compile | 16m 33s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 16m 33s | | the patch passed | | +1 :green_heart: | compile | 15m 37s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | javac | 15m 37s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 3m 46s | | the patch passed | | +1 :green_heart: | mvnsite | 3m 41s | | the patch passed | | +1 :green_heart: | javadoc | 2m 56s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 3m 27s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 7m 41s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 56s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 19m 29s | | hadoop-common in the patch passed. | | -1 :x: | unit | 229m 50s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5552/11/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | -1 :x: | unit | 21m 56s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5552/11/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 1m 3s | | The patch does not generate ASF License warnings. | | | | 475m 41s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.server.namenode.ha.TestObserverNode | | | hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5552/11/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5552 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux def343e76b56 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3 19:10:13 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | |
[jira] [Commented] (HDFS-16979) RBF: Add dfsrouter port in hdfsauditlog
[ https://issues.apache.org/jira/browse/HDFS-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719315#comment-17719315 ] ASF GitHub Bot commented on HDFS-16979: --- ayushtkn commented on code in PR #5552: URL: https://github.com/apache/hadoop/pull/5552#discussion_r1184989929 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestNamenodeAuditlogWithDFSRouterPort.java: ## @@ -0,0 +1,91 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.federation.router; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster; +import org.apache.hadoop.hdfs.server.federation.RouterConfigBuilder; +import org.apache.hadoop.hdfs.server.namenode.FSNamesystem; +import org.apache.hadoop.test.GenericTestUtils; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.regex.Pattern; + +import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_CALLER_CONTEXT_ENABLED_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_AUDIT_LOG_WITH_REMOTE_PORT_KEY; +import static org.junit.Assert.assertTrue; + +/** + * Test namenode auditlog log the dfsrouterPort info when the request is form dfsrouter. + */ +public class TestNamenodeAuditlogWithDFSRouterPort { + private static final Logger LOG = LoggerFactory.getLogger( + TestNamenodeAuditlogWithDFSRouterPort.class); + private static final Pattern AUDIT_WITH_PORT_PATTERN = Pattern.compile( Review Comment: TestRouterRpc is in hadoop-rbf only, why you need to add hdfs-rbf jar hadoop-hdfs? > RBF: Add dfsrouter port in hdfsauditlog > --- > > Key: HDFS-16979 > URL: https://issues.apache.org/jira/browse/HDFS-16979 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > When remote client request through dfsrouter to namenode, the hdfsauditlog > record the remote client ip and port ,dfsrouter IP,but lack of dfsrouter port. > This patch is done for this scene. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16979) RBF: Add dfsrouter port in hdfsauditlog
[ https://issues.apache.org/jira/browse/HDFS-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719304#comment-17719304 ] ASF GitHub Bot commented on HDFS-16979: --- LiuGuH commented on code in PR #5552: URL: https://github.com/apache/hadoop/pull/5552#discussion_r1184952734 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestNamenodeAuditlogWithDFSRouterPort.java: ## @@ -0,0 +1,91 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.federation.router; + +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.hdfs.server.federation.MiniRouterDFSCluster; +import org.apache.hadoop.hdfs.server.federation.RouterConfigBuilder; +import org.apache.hadoop.hdfs.server.namenode.FSNamesystem; +import org.apache.hadoop.test.GenericTestUtils; +import org.junit.Test; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.regex.Pattern; + +import static org.apache.hadoop.fs.CommonConfigurationKeysPublic.HADOOP_CALLER_CONTEXT_ENABLED_KEY; +import static org.apache.hadoop.hdfs.DFSConfigKeys.DFS_NAMENODE_AUDIT_LOG_WITH_REMOTE_PORT_KEY; +import static org.junit.Assert.assertTrue; + +/** + * Test namenode auditlog log the dfsrouterPort info when the request is form dfsrouter. + */ +public class TestNamenodeAuditlogWithDFSRouterPort { + private static final Logger LOG = LoggerFactory.getLogger( + TestNamenodeAuditlogWithDFSRouterPort.class); + private static final Pattern AUDIT_WITH_PORT_PATTERN = Pattern.compile( Review Comment: In my environment, there is a maven conflict if I add dependence hadoop-hdfs-rbf test jar in hadoop-hdfs . The error detail is { java: Modules hadoop-yarn-server-tests and hadoop-yarn-server-resourcemanager must have the same 'additional command line parameters' specified because of cyclic dependencies between them } . I confused > RBF: Add dfsrouter port in hdfsauditlog > --- > > Key: HDFS-16979 > URL: https://issues.apache.org/jira/browse/HDFS-16979 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > When remote client request through dfsrouter to namenode, the hdfsauditlog > record the remote client ip and port ,dfsrouter IP,but lack of dfsrouter port. > This patch is done for this scene. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16979) RBF: Add dfsrouter port in hdfsauditlog
[ https://issues.apache.org/jira/browse/HDFS-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719266#comment-17719266 ] ASF GitHub Bot commented on HDFS-16979: --- hadoop-yetus commented on PR #5552: URL: https://github.com/apache/hadoop/pull/5552#issuecomment-1534552305 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 52s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +0 :ok: | mvndep | 16m 3s | | Maven dependency ordering for branch | | +1 :green_heart: | mvninstall | 22m 49s | | trunk passed | | +1 :green_heart: | compile | 17m 17s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | compile | 15m 40s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | checkstyle | 4m 4s | | trunk passed | | +1 :green_heart: | mvnsite | 3m 44s | | trunk passed | | +1 :green_heart: | javadoc | 3m 4s | | trunk passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 3m 23s | | trunk passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 7m 20s | | trunk passed | | +1 :green_heart: | shadedclient | 24m 26s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +0 :ok: | mvndep | 0m 24s | | Maven dependency ordering for patch | | +1 :green_heart: | mvninstall | 2m 33s | | the patch passed | | +1 :green_heart: | compile | 16m 33s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javac | 16m 33s | | the patch passed | | +1 :green_heart: | compile | 15m 40s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | javac | 15m 40s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 3m 54s | | the patch passed | | +1 :green_heart: | mvnsite | 3m 41s | | the patch passed | | +1 :green_heart: | javadoc | 2m 56s | | the patch passed with JDK Ubuntu-11.0.18+10-post-Ubuntu-0ubuntu120.04.1 | | +1 :green_heart: | javadoc | 3m 26s | | the patch passed with JDK Private Build-1.8.0_362-8u362-ga-0ubuntu1~20.04.1-b09 | | +1 :green_heart: | spotbugs | 7m 43s | | the patch passed | | +1 :green_heart: | shadedclient | 24m 25s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 20s | | hadoop-common in the patch passed. | | -1 :x: | unit | 227m 52s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5552/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | -1 :x: | unit | 22m 24s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5552/10/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs-rbf.txt) | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 1m 2s | | The patch does not generate ASF License warnings. | | | | 473m 21s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.datanode.TestDirectoryScanner | | | hadoop.hdfs.TestRollingUpgrade | | | hadoop.hdfs.server.federation.router.TestRouterRPCMultipleDestinationMountTableResolver | | | hadoop.hdfs.server.federation.router.TestRouterRpcMultiDestination | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.42 ServerAPI=1.42 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5552/10/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5552 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 48816cbfe49a 4.15.0-206-generic #217-Ubuntu SMP Fri Feb 3
[jira] [Commented] (HDFS-16979) RBF: Add dfsrouter port in hdfsauditlog
[ https://issues.apache.org/jira/browse/HDFS-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719255#comment-17719255 ] ASF GitHub Bot commented on HDFS-16979: --- ayushtkn commented on code in PR #5552: URL: https://github.com/apache/hadoop/pull/5552#discussion_r1184829824 ## hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/CallerContext.java: ## @@ -50,7 +50,8 @@ public final class CallerContext { public static final String CLIENT_ID_STR = "clientId"; public static final String CLIENT_CALL_ID_STR = "clientCallId"; public static final String REAL_USER_STR = "realUser"; - + public static final String IS_DFSROUTER = "isDfsRouter"; + public static final String DFSROUTER_PORT_STR = "dfsRouterPort"; Review Comment: There is no point having dfsRouterPort or is isDfsRouter, the code is very generic and adding Router specific code should be the last thing to do in the Namenode, Namenode doesn't care whether router is proxying or any xyz client or service is proxying. Change it to be generic Proxy Client port nothing exactly router specific in the Namenode > RBF: Add dfsrouter port in hdfsauditlog > --- > > Key: HDFS-16979 > URL: https://issues.apache.org/jira/browse/HDFS-16979 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > When remote client request through dfsrouter to namenode, the hdfsauditlog > record the remote client ip and port ,dfsrouter IP,but lack of dfsrouter port. > This patch is done for this scene. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16979) RBF: Add dfsrouter port in hdfsauditlog
[ https://issues.apache.org/jira/browse/HDFS-16979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719192#comment-17719192 ] ASF GitHub Bot commented on HDFS-16979: --- LiuGuH commented on code in PR #5552: URL: https://github.com/apache/hadoop/pull/5552#discussion_r1184664573 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java: ## @@ -453,15 +453,15 @@ private void logAuditEvent(boolean succeeded, private void appendClientPortToCallerContextIfAbsent() { final CallerContext ctx = CallerContext.getCurrent(); -if (isClientPortInfoAbsent(ctx)) { - String origContext = ctx == null ? null : ctx.getContext(); - byte[] origSignature = ctx == null ? null : ctx.getSignature(); - CallerContext.setCurrent( - new CallerContext.Builder(origContext, contextFieldSeparator) - .append(CallerContext.CLIENT_PORT_STR, String.valueOf(Server.getRemotePort())) - .setSignature(origSignature) - .build()); -} +String origContext = ctx == null ? null : ctx.getContext(); +byte[] origSignature = ctx == null ? null : ctx.getSignature(); +String clientPort = isClientPortInfoAbsent(ctx) ? CallerContext.CLIENT_PORT_STR : Review Comment: I add via isdfsrouter now. > RBF: Add dfsrouter port in hdfsauditlog > --- > > Key: HDFS-16979 > URL: https://issues.apache.org/jira/browse/HDFS-16979 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: liuguanghua >Priority: Major > Labels: pull-request-available > > When remote client request through dfsrouter to namenode, the hdfsauditlog > record the remote client ip and port ,dfsrouter IP,but lack of dfsrouter port. > This patch is done for this scene. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org