[jira] [Commented] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks
[ https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654279#comment-17654279 ] ASF GitHub Bot commented on HDFS-16879: --- haiyang1987 commented on PR #5264: URL: https://github.com/apache/hadoop/pull/5264#issuecomment-1370454477 Thanks @ZanderXu @dineshchitlangia help review and merged. > EC : Fsck -blockId shows number of redundant internal block replicas for EC > Blocks > -- > > Key: HDFS-16879 > URL: https://issues.apache.org/jira/browse/HDFS-16879 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > For the block of the ec file run hdfs fsck -blockId xxx can add shows number > of redundant internal block replicas. > for example: the current blockgroup has 10 live replicas, and it will show > there are 9 live replicas. > actually, there is a live replica that should be in the redundant state, can > add shows "No. of redundant Replica: 1" > {code:java} > hdfs fsck -blockId blk_-xxx > Block Id: blk_-xxx > Block belongs to: /ec/file1 > No. of Expected Replica: 9 > No. of live Replica: 9 > No. of excess Replica: 0 > No. of stale Replica: 0 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > Block replica on datanode/rack: ip-xxx1 is HEALTHY > Block replica on datanode/rack: ip-xxx2 is HEALTHY > Block replica on datanode/rack: ip-xxx3 is HEALTHY > Block replica on datanode/rack: ip-xxx4 is HEALTHY > Block replica on datanode/rack: ip-xxx5 is HEALTHY > Block replica on datanode/rack: ip-xxx6 is HEALTHY > Block replica on datanode/rack: ip-xxx7 is HEALTHY > Block replica on datanode/rack: ip-xxx8 is HEALTHY > Block replica on datanode/rack: ip-xxx9 is HEALTHY > Block replica on datanode/rack: ip-xxx10 is HEALTHY > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks
[ https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654280#comment-17654280 ] ASF GitHub Bot commented on HDFS-16879: --- ZanderXu commented on PR #5264: URL: https://github.com/apache/hadoop/pull/5264#issuecomment-1370454618 The failed UT `hadoop.hdfs.TestLeaseRecovery2` is traced by HDFS-16853. > EC : Fsck -blockId shows number of redundant internal block replicas for EC > Blocks > -- > > Key: HDFS-16879 > URL: https://issues.apache.org/jira/browse/HDFS-16879 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > For the block of the ec file run hdfs fsck -blockId xxx can add shows number > of redundant internal block replicas. > for example: the current blockgroup has 10 live replicas, and it will show > there are 9 live replicas. > actually, there is a live replica that should be in the redundant state, can > add shows "No. of redundant Replica: 1" > {code:java} > hdfs fsck -blockId blk_-xxx > Block Id: blk_-xxx > Block belongs to: /ec/file1 > No. of Expected Replica: 9 > No. of live Replica: 9 > No. of excess Replica: 0 > No. of stale Replica: 0 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > Block replica on datanode/rack: ip-xxx1 is HEALTHY > Block replica on datanode/rack: ip-xxx2 is HEALTHY > Block replica on datanode/rack: ip-xxx3 is HEALTHY > Block replica on datanode/rack: ip-xxx4 is HEALTHY > Block replica on datanode/rack: ip-xxx5 is HEALTHY > Block replica on datanode/rack: ip-xxx6 is HEALTHY > Block replica on datanode/rack: ip-xxx7 is HEALTHY > Block replica on datanode/rack: ip-xxx8 is HEALTHY > Block replica on datanode/rack: ip-xxx9 is HEALTHY > Block replica on datanode/rack: ip-xxx10 is HEALTHY > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks
[ https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu resolved HDFS-16879. - Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > EC : Fsck -blockId shows number of redundant internal block replicas for EC > Blocks > -- > > Key: HDFS-16879 > URL: https://issues.apache.org/jira/browse/HDFS-16879 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > For the block of the ec file run hdfs fsck -blockId xxx can add shows number > of redundant internal block replicas. > for example: the current blockgroup has 10 live replicas, and it will show > there are 9 live replicas. > actually, there is a live replica that should be in the redundant state, can > add shows "No. of redundant Replica: 1" > {code:java} > hdfs fsck -blockId blk_-xxx > Block Id: blk_-xxx > Block belongs to: /ec/file1 > No. of Expected Replica: 9 > No. of live Replica: 9 > No. of excess Replica: 0 > No. of stale Replica: 0 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > Block replica on datanode/rack: ip-xxx1 is HEALTHY > Block replica on datanode/rack: ip-xxx2 is HEALTHY > Block replica on datanode/rack: ip-xxx3 is HEALTHY > Block replica on datanode/rack: ip-xxx4 is HEALTHY > Block replica on datanode/rack: ip-xxx5 is HEALTHY > Block replica on datanode/rack: ip-xxx6 is HEALTHY > Block replica on datanode/rack: ip-xxx7 is HEALTHY > Block replica on datanode/rack: ip-xxx8 is HEALTHY > Block replica on datanode/rack: ip-xxx9 is HEALTHY > Block replica on datanode/rack: ip-xxx10 is HEALTHY > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks
[ https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654277#comment-17654277 ] ASF GitHub Bot commented on HDFS-16879: --- ZanderXu commented on PR #5264: URL: https://github.com/apache/hadoop/pull/5264#issuecomment-1370453598 Merged, thanks @haiyang1987 for your report. Thanks @dineshchitlangia for your review. > EC : Fsck -blockId shows number of redundant internal block replicas for EC > Blocks > -- > > Key: HDFS-16879 > URL: https://issues.apache.org/jira/browse/HDFS-16879 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > For the block of the ec file run hdfs fsck -blockId xxx can add shows number > of redundant internal block replicas. > for example: the current blockgroup has 10 live replicas, and it will show > there are 9 live replicas. > actually, there is a live replica that should be in the redundant state, can > add shows "No. of redundant Replica: 1" > {code:java} > hdfs fsck -blockId blk_-xxx > Block Id: blk_-xxx > Block belongs to: /ec/file1 > No. of Expected Replica: 9 > No. of live Replica: 9 > No. of excess Replica: 0 > No. of stale Replica: 0 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > Block replica on datanode/rack: ip-xxx1 is HEALTHY > Block replica on datanode/rack: ip-xxx2 is HEALTHY > Block replica on datanode/rack: ip-xxx3 is HEALTHY > Block replica on datanode/rack: ip-xxx4 is HEALTHY > Block replica on datanode/rack: ip-xxx5 is HEALTHY > Block replica on datanode/rack: ip-xxx6 is HEALTHY > Block replica on datanode/rack: ip-xxx7 is HEALTHY > Block replica on datanode/rack: ip-xxx8 is HEALTHY > Block replica on datanode/rack: ip-xxx9 is HEALTHY > Block replica on datanode/rack: ip-xxx10 is HEALTHY > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks
[ https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654276#comment-17654276 ] ASF GitHub Bot commented on HDFS-16879: --- ZanderXu merged PR #5264: URL: https://github.com/apache/hadoop/pull/5264 > EC : Fsck -blockId shows number of redundant internal block replicas for EC > Blocks > -- > > Key: HDFS-16879 > URL: https://issues.apache.org/jira/browse/HDFS-16879 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Haiyang Hu >Assignee: Haiyang Hu >Priority: Major > Labels: pull-request-available > > For the block of the ec file run hdfs fsck -blockId xxx can add shows number > of redundant internal block replicas. > for example: the current blockgroup has 10 live replicas, and it will show > there are 9 live replicas. > actually, there is a live replica that should be in the redundant state, can > add shows "No. of redundant Replica: 1" > {code:java} > hdfs fsck -blockId blk_-xxx > Block Id: blk_-xxx > Block belongs to: /ec/file1 > No. of Expected Replica: 9 > No. of live Replica: 9 > No. of excess Replica: 0 > No. of stale Replica: 0 > No. of decommissioned Replica: 0 > No. of decommissioning Replica: 0 > No. of corrupted Replica: 0 > Block replica on datanode/rack: ip-xxx1 is HEALTHY > Block replica on datanode/rack: ip-xxx2 is HEALTHY > Block replica on datanode/rack: ip-xxx3 is HEALTHY > Block replica on datanode/rack: ip-xxx4 is HEALTHY > Block replica on datanode/rack: ip-xxx5 is HEALTHY > Block replica on datanode/rack: ip-xxx6 is HEALTHY > Block replica on datanode/rack: ip-xxx7 is HEALTHY > Block replica on datanode/rack: ip-xxx8 is HEALTHY > Block replica on datanode/rack: ip-xxx9 is HEALTHY > Block replica on datanode/rack: ip-xxx10 is HEALTHY > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.
[ https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654257#comment-17654257 ] ASF GitHub Bot commented on HDFS-16880: --- hfutatzhanghb commented on code in PR #5262: URL: https://github.com/apache/hadoop/pull/5262#discussion_r1061099106 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -687,7 +687,7 @@ T invokeAtAvailableNs(RemoteMethod method, Class clazz) // If default Ns is present return result from that namespace. if (!nsId.isEmpty()) { try { -return rpcClient.invokeSingle(nsId, method, clazz); +return rpcClient.invokeSingle(nsId, method, clazz, ""); Review Comment: Hi, @ayushtkn @goiri . Could you please take a look at https://issues.apache.org/jira/browse/HDFS-16882 is it feasible? thanks a lot. > modify invokeSingleXXX interface in order to pass actual file src to namenode > for debug info. > - > > Key: HDFS-16880 > URL: https://issues.apache.org/jira/browse/HDFS-16880 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > Labels: pull-request-available > > We found lots of INFO level log like below: > {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by > DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1 > 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480 > {quote} > It lost the real path of completeFile. Actually this is caused by : > > *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String, > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)* > In this method, it instantiates a RemoteLocationContext object: > *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");* > and then execute: *Object[] params = method.getParams(loc);* > The problem is right here, becasuse we always use new RemoteParam(), so, > context.getDest() always return "/"; That's why we saw lots of incorrect logs. > > After diving into invokeSingleXXX source code, I found the following RPCs > classified as need actual src and not need actual src. > > *need src path RPC:* > addBlock、abandonBlock、getAdditionalDatanode、complete > *not need src path RPC:* > updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked > by: > getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies) > > After changes, the src can be pass to NN correctly. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16882) Add cache hit rate metric in MountTableResolver#getDestinationForPath
ZhangHB created HDFS-16882: -- Summary: Add cache hit rate metric in MountTableResolver#getDestinationForPath Key: HDFS-16882 URL: https://issues.apache.org/jira/browse/HDFS-16882 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Affects Versions: 3.3.4 Reporter: ZhangHB Currently, the default value of "dfs.federation.router.mount-table.cache.enable" is ture, the default value of "dfs.federation.router.mount-table.max-cache-size" is 1. But there is no metric that display cache hit rate, I think we can add a hit rate metric to watch the Cache performance and better tuning the parameters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission
[ https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654231#comment-17654231 ] ASF GitHub Bot commented on HDFS-16881: --- szetszwo commented on code in PR #5268: URL: https://github.com/apache/hadoop/pull/5268#discussion_r1061074364 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java: ## @@ -388,6 +392,12 @@ public enum DirOp { DFS_PROTECTED_SUBDIRECTORIES_ENABLE, DFS_PROTECTED_SUBDIRECTORIES_ENABLE_DEFAULT); +final long readLockThresholdMs = conf.getLong( +DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_KEY, +DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_DEFAULT); +// use half of read lock threshold Review Comment: Indeed, let's make it configurable. ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSPermissionChecker.java: ## @@ -446,4 +447,20 @@ private static INodeFile createINodeFile(INodeDirectory parent, String name, parent.addChild(inodeFile); return inodeFile; } + + @Test + public void testCheckAccessControlEnforcerSlowness() throws Exception { +final long thresholdMs = 10; +final String message = FSPermissionChecker.runCheckPermission(() -> { + try { +Thread.sleep(20); + } catch (InterruptedException e) { +throw new RuntimeException(e); Review Comment: Sure. Thanks for pointing out the JUnit behavior. > Warn if AccessControlEnforcer runs for a long time to check permission > -- > > Key: HDFS-16881 > URL: https://issues.apache.org/jira/browse/HDFS-16881 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Labels: pull-request-available > > AccessControlEnforcer is configurable. If an external AccessControlEnforcer > runs for a long time to check permission with the FSnamesystem lock, it will > significantly slow down the entire Namenode. In the JIRA, we will print a > WARN message when it happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members
[ https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654210#comment-17654210 ] ASF GitHub Bot commented on HDFS-16872: --- hadoop-yetus commented on PR #5269: URL: https://github.com/apache/hadoop/pull/5269#issuecomment-1370338282 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 38s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 51s | | trunk passed | | +1 :green_heart: | compile | 24m 35s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 22m 11s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 5s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 39s | | trunk passed | | -1 :x: | javadoc | 1m 11s | [/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5269/1/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in trunk failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 42s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 45s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 40s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 0s | | the patch passed | | +1 :green_heart: | compile | 24m 32s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 24m 32s | | the patch passed | | +1 :green_heart: | compile | 20m 58s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 20m 58s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 58s | [/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5269/1/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt) | hadoop-common-project/hadoop-common: The patch generated 1 new + 0 unchanged - 0 fixed = 1 total (was 0) | | +1 :green_heart: | mvnsite | 1m 39s | | the patch passed | | -1 :x: | javadoc | 1m 5s | [/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5269/1/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-common in the patch failed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04. | | +1 :green_heart: | javadoc | 0m 47s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 2m 40s | | the patch passed | | +1 :green_heart: | shadedclient | 23m 30s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 18m 14s | | hadoop-common in the patch passed. | | +1 :green_heart: | asflicense | 1m 3s | | The patch does not generate ASF License warnings. | | | | 216m 29s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5269/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5269 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux e4d5f5b01f11 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build
[jira] [Commented] (HDFS-16875) Erasure Coding: data access proxy to allow old clients to read EC data
[ https://issues.apache.org/jira/browse/HDFS-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654209#comment-17654209 ] Jing Zhao commented on HDFS-16875: -- Posted the design doc for the EC access proxy. > Erasure Coding: data access proxy to allow old clients to read EC data > -- > > Key: HDFS-16875 > URL: https://issues.apache.org/jira/browse/HDFS-16875 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ec, erasure-coding >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Major > Attachments: Erasure Coding Access Proxy.pdf > > > Erasure Coding is only supported by Hadoop 3, while many production > deployments still depend on Hadoop 2. Upgrading the whole data tech stack to > the Hadoop 3 release may involve big migration efforts and even reliability > risks, considering the incompatibilities between these two Hadoop major > releases as well as the potential uncovered issues and risks hidden in newer > releases. Therefore, we need to find a solution, with the least amount of > migration effort and risk, to adopt Erasure Coding for cost efficiency but > still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in > a transparent manner. > Internally we have developed an EC access proxy which translates the EC data > for old clients. We also extend the NameNode RPC so it can recognize HDFS > clients with/without the EC support, and redirect the old clients to the > proxy. With the proxy we set up separate Erasure Coding clusters storing > hundreds of PB of data, while leaving other production clusters and all the > upper layer applications untouched. > Considering some changes are made at fundamental components of HDFS (e.g., > client-NN RPC header), we do not aim to merge the change to trunk. We will > use this ticket to share the design and implementation details (including the > code) and collect feedback. We may use a separate github repo to open source > the implementation later. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16875) Erasure Coding: data access proxy to allow old clients to read EC data
[ https://issues.apache.org/jira/browse/HDFS-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Zhao updated HDFS-16875: - Attachment: Erasure Coding Access Proxy.pdf > Erasure Coding: data access proxy to allow old clients to read EC data > -- > > Key: HDFS-16875 > URL: https://issues.apache.org/jira/browse/HDFS-16875 > Project: Hadoop HDFS > Issue Type: New Feature > Components: ec, erasure-coding >Reporter: Jing Zhao >Assignee: Jing Zhao >Priority: Major > Attachments: Erasure Coding Access Proxy.pdf > > > Erasure Coding is only supported by Hadoop 3, while many production > deployments still depend on Hadoop 2. Upgrading the whole data tech stack to > the Hadoop 3 release may involve big migration efforts and even reliability > risks, considering the incompatibilities between these two Hadoop major > releases as well as the potential uncovered issues and risks hidden in newer > releases. Therefore, we need to find a solution, with the least amount of > migration effort and risk, to adopt Erasure Coding for cost efficiency but > still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in > a transparent manner. > Internally we have developed an EC access proxy which translates the EC data > for old clients. We also extend the NameNode RPC so it can recognize HDFS > clients with/without the EC support, and redirect the old clients to the > proxy. With the proxy we set up separate Erasure Coding clusters storing > hundreds of PB of data, while leaving other production clusters and all the > upper layer applications untouched. > Considering some changes are made at fundamental components of HDFS (e.g., > client-NN RPC header), we do not aim to merge the change to trunk. We will > use this ticket to share the design and implementation details (including the > code) and collect feedback. We may use a separate github repo to open source > the implementation later. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members
[ https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654177#comment-17654177 ] ASF GitHub Bot commented on HDFS-16872: --- xkrogen commented on PR #5246: URL: https://github.com/apache/hadoop/pull/5246#issuecomment-1370183767 I spent a little time poking at `LogThrottlingHelper` to see what it would take to make it concurrent. I have a demonstration PR here: #5269. It's a bit complicated as it relies on functionality from `ConcurrentHashMap` as well as some atomic operations (mostly CAS). I suspect it's likely overkill from an implementation complexity perspective since we don't _expect_ high contention on this class. I would say it probably makes sense to just use `synchronized`. WDYT? > Fix log throttling by declaring LogThrottlingHelper as static members > - > > Key: HDFS-16872 > URL: https://issues.apache.org/jira/browse/HDFS-16872 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.4 >Reporter: Chengbing Liu >Priority: Major > Labels: pull-request-available > > In our production cluster with Observer NameNode enabled, we have plenty of > logs printed by {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. The > {{LogThrottlingHelper}} doesn't seem to work. > {noformat} > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688]' to transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688]' to > transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250688, > 17686250688], ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688]) of total size 527.0, total edits > 1.0, total load time 0.0 ms > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693]' to transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693]' to > transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250689, > 17686250693], ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693]) of total size 890.0, total edits > 5.0, total load time 1.0 ms > {noformat} > After some digging, I found the cause is that {{LogThrottlingHelper}}'s are > declared as instance variables of all the enclosing classes, including > {{FSImage}}, {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. > Therefore the logging frequency will not be limited across different > instances. For classes with only limited number of instances, such as > {{FSImage}}, this is fine. For others whose instances are created frequently, > such as {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}, it will > result in plenty of logs. > This can be fixed by declaring {{LogThrottlingHelper}}'s as static members. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members
[ https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654176#comment-17654176 ] ASF GitHub Bot commented on HDFS-16872: --- xkrogen opened a new pull request, #5269: URL: https://github.com/apache/hadoop/pull/5269 WIP PR for demonstration purposes only while working through #5246 > Fix log throttling by declaring LogThrottlingHelper as static members > - > > Key: HDFS-16872 > URL: https://issues.apache.org/jira/browse/HDFS-16872 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 3.3.4 >Reporter: Chengbing Liu >Priority: Major > Labels: pull-request-available > > In our production cluster with Observer NameNode enabled, we have plenty of > logs printed by {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. The > {{LogThrottlingHelper}} doesn't seem to work. > {noformat} > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, > 17686250688]' to transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688]' to > transaction ID 17686250688 > 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250688, > 17686250688], ByteStringEditLog[17686250688, 17686250688], > ByteStringEditLog[17686250688, 17686250688]) of total size 527.0, total edits > 1.0, total load time 0.0 ms > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Start loading edits file ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693] maxTxnsToRead = 9223372036854775807 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, > 17686250693]' to transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: > Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693]' to > transaction ID 17686250689 > 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: > Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250689, > 17686250693], ByteStringEditLog[17686250689, 17686250693], > ByteStringEditLog[17686250689, 17686250693]) of total size 890.0, total edits > 5.0, total load time 1.0 ms > {noformat} > After some digging, I found the cause is that {{LogThrottlingHelper}}'s are > declared as instance variables of all the enclosing classes, including > {{FSImage}}, {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. > Therefore the logging frequency will not be limited across different > instances. For classes with only limited number of instances, such as > {{FSImage}}, this is fine. For others whose instances are created frequently, > such as {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}, it will > result in plenty of logs. > This can be fixed by declaring {{LogThrottlingHelper}}'s as static members. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16877) Namenode doesn't use alignment context in TestObserverWithRouter
[ https://issues.apache.org/jira/browse/HDFS-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654160#comment-17654160 ] ASF GitHub Bot commented on HDFS-16877: --- hadoop-yetus commented on PR #5257: URL: https://github.com/apache/hadoop/pull/5257#issuecomment-1370149310 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 12m 24s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 41s | | trunk passed | | +1 :green_heart: | compile | 0m 45s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 0m 41s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 0m 35s | | trunk passed | | +1 :green_heart: | mvnsite | 0m 45s | | trunk passed | | +1 :green_heart: | javadoc | 0m 52s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 0m 58s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 30s | | trunk passed | | +1 :green_heart: | shadedclient | 20m 37s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 35s | | the patch passed | | +1 :green_heart: | compile | 0m 36s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javac | 0m 36s | | the patch passed | | +1 :green_heart: | compile | 0m 32s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | javac | 0m 32s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 20s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 35s | | the patch passed | | +1 :green_heart: | javadoc | 0m 33s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 0m 51s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 1m 21s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 36s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 22m 17s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 39s | | The patch does not generate ASF License warnings. | | | | 128m 7s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5257/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/5257 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 691427fdab40 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 4d2a1ca6903f3b611c9f10727cce970168f727de | | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5257/5/testReport/ | | Max. process+thread count | 2230 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5257/5/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated. > Namenode doesn't use alignment context in TestObserverWithRoute
[jira] [Commented] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission
[ https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654142#comment-17654142 ] ASF GitHub Bot commented on HDFS-16881: --- cnauroth commented on code in PR #5268: URL: https://github.com/apache/hadoop/pull/5268#discussion_r1060859919 ## hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java: ## @@ -388,6 +392,12 @@ public enum DirOp { DFS_PROTECTED_SUBDIRECTORIES_ENABLE, DFS_PROTECTED_SUBDIRECTORIES_ENABLE_DEFAULT); +final long readLockThresholdMs = conf.getLong( +DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_KEY, +DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_DEFAULT); +// use half of read lock threshold Review Comment: Is it necessary for this to use half? If so, can you describe why in this comment? ## hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSPermissionChecker.java: ## @@ -446,4 +447,20 @@ private static INodeFile createINodeFile(INodeDirectory parent, String name, parent.addChild(inodeFile); return inodeFile; } + + @Test + public void testCheckAccessControlEnforcerSlowness() throws Exception { +final long thresholdMs = 10; +final String message = FSPermissionChecker.runCheckPermission(() -> { + try { +Thread.sleep(20); + } catch (InterruptedException e) { +throw new RuntimeException(e); Review Comment: I suggest adding `Thread.currentThread().interrupt();` before throwing. It shouldn't matter much in practice, but JUnit runner threads have some strange behavior when interrupted status is not restored as expected. > Warn if AccessControlEnforcer runs for a long time to check permission > -- > > Key: HDFS-16881 > URL: https://issues.apache.org/jira/browse/HDFS-16881 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Labels: pull-request-available > > AccessControlEnforcer is configurable. If an external AccessControlEnforcer > runs for a long time to check permission with the FSnamesystem lock, it will > significantly slow down the entire Namenode. In the JIRA, we will print a > WARN message when it happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16877) Namenode doesn't use alignment context in TestObserverWithRouter
[ https://issues.apache.org/jira/browse/HDFS-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654112#comment-17654112 ] ASF GitHub Bot commented on HDFS-16877: --- simbadzina commented on code in PR #5257: URL: https://github.com/apache/hadoop/pull/5257#discussion_r1060799582 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -510,6 +513,15 @@ BalanceProcedureScheduler getFedRenameScheduler() { return this.fedRenameScheduler; } + /** + * Get the routerStateIdContext used by this server. + * @return routerStateIdContext + */ + @VisibleForTesting + public RouterStateIdContext getRouterStateIdContext() { Review Comment: Protected works too. If we need to increase visibility later, we can do so. > Namenode doesn't use alignment context in TestObserverWithRouter > > > Key: HDFS-16877 > URL: https://issues.apache.org/jira/browse/HDFS-16877 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, rbf >Reporter: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > > We need to set "{*}dfs.namenode.state.context.enabled{*}" to true for the > namenode to send it's stateId in client responses. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16877) Namenode doesn't use alignment context in TestObserverWithRouter
[ https://issues.apache.org/jira/browse/HDFS-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654111#comment-17654111 ] ASF GitHub Bot commented on HDFS-16877: --- simbadzina commented on code in PR #5257: URL: https://github.com/apache/hadoop/pull/5257#discussion_r1060799003 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -203,6 +203,9 @@ public class RouterRpcServer extends AbstractService implements ClientProtocol, /** Router using this RPC server. */ private final Router router; + /** RouterStateIdContext for this RPC server. */ Review Comment: Fixed. > Namenode doesn't use alignment context in TestObserverWithRouter > > > Key: HDFS-16877 > URL: https://issues.apache.org/jira/browse/HDFS-16877 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, rbf >Reporter: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > > We need to set "{*}dfs.namenode.state.context.enabled{*}" to true for the > namenode to send it's stateId in client responses. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission
[ https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654110#comment-17654110 ] ASF GitHub Bot commented on HDFS-16881: --- hadoop-yetus commented on PR #5268: URL: https://github.com/apache/hadoop/pull/5268#issuecomment-1370033666 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 31s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 24s | | trunk passed | | +1 :green_heart: | compile | 1m 30s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | compile | 1m 19s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | checkstyle | 1m 6s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 29s | | trunk passed | | +1 :green_heart: | javadoc | 1m 7s | | trunk passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 31s | | trunk passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 33s | | trunk passed | | +1 :green_heart: | shadedclient | 25m 40s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 20s | | the patch passed | | +1 :green_heart: | compile | 1m 25s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | -1 :x: | javac | 1m 25s | [/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5268/1/artifact/out/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt) | hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 generated 2 new + 910 unchanged - 0 fixed = 912 total (was 910) | | +1 :green_heart: | compile | 1m 13s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | -1 :x: | javac | 1m 13s | [/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_352-8u352-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5268/1/artifact/out/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_352-8u352-ga-1~20.04-b08.txt) | hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_352-8u352-ga-1~20.04-b08 with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 generated 2 new + 889 unchanged - 0 fixed = 891 total (was 889) | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 0m 53s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5268/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 163 unchanged - 0 fixed = 166 total (was 163) | | +1 :green_heart: | mvnsite | 1m 23s | | the patch passed | | +1 :green_heart: | javadoc | 0m 51s | | the patch passed with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 | | +1 :green_heart: | javadoc | 1m 25s | | the patch passed with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 | | +1 :green_heart: | spotbugs | 3m 26s | | the patch passed | | +1 :green_heart: | shadedclient | 28m 13s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 392m 42s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5268/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 55s | | The patch does not generate ASF License warnings. | | | | 510m 14s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 | | Subsystem | Report/Notes | |--:|:-| |
[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.
[ https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653977#comment-17653977 ] ASF GitHub Bot commented on HDFS-16880: --- hfutatzhanghb commented on code in PR #5262: URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060537469 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -687,7 +687,7 @@ T invokeAtAvailableNs(RemoteMethod method, Class clazz) // If default Ns is present return result from that namespace. if (!nsId.isEmpty()) { try { -return rpcClient.invokeSingle(nsId, method, clazz); +return rpcClient.invokeSingle(nsId, method, clazz, ""); Review Comment: Hi,@goiri , could you please also review the code of https://issues.apache.org/jira/browse/HDFS-16865 > modify invokeSingleXXX interface in order to pass actual file src to namenode > for debug info. > - > > Key: HDFS-16880 > URL: https://issues.apache.org/jira/browse/HDFS-16880 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > Labels: pull-request-available > > We found lots of INFO level log like below: > {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by > DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1 > 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480 > {quote} > It lost the real path of completeFile. Actually this is caused by : > > *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String, > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)* > In this method, it instantiates a RemoteLocationContext object: > *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");* > and then execute: *Object[] params = method.getParams(loc);* > The problem is right here, becasuse we always use new RemoteParam(), so, > context.getDest() always return "/"; That's why we saw lots of incorrect logs. > > After diving into invokeSingleXXX source code, I found the following RPCs > classified as need actual src and not need actual src. > > *need src path RPC:* > addBlock、abandonBlock、getAdditionalDatanode、complete > *not need src path RPC:* > updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked > by: > getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies) > > After changes, the src can be pass to NN correctly. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.
[ https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653976#comment-17653976 ] ASF GitHub Bot commented on HDFS-16880: --- hfutatzhanghb commented on code in PR #5262: URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060536835 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -687,7 +687,7 @@ T invokeAtAvailableNs(RemoteMethod method, Class clazz) // If default Ns is present return result from that namespace. if (!nsId.isEmpty()) { try { -return rpcClient.invokeSingle(nsId, method, clazz); +return rpcClient.invokeSingle(nsId, method, clazz, ""); Review Comment: Hi, @goiri , here if we make it an actual field or attribute, It would invoke `getLocationsForPath` to get actual path. Then, It violates the purpose that we use previous block to improve some RPC performance. > modify invokeSingleXXX interface in order to pass actual file src to namenode > for debug info. > - > > Key: HDFS-16880 > URL: https://issues.apache.org/jira/browse/HDFS-16880 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > Labels: pull-request-available > > We found lots of INFO level log like below: > {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by > DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1 > 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480 > {quote} > It lost the real path of completeFile. Actually this is caused by : > > *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String, > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)* > In this method, it instantiates a RemoteLocationContext object: > *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");* > and then execute: *Object[] params = method.getParams(loc);* > The problem is right here, becasuse we always use new RemoteParam(), so, > context.getDest() always return "/"; That's why we saw lots of incorrect logs. > > After diving into invokeSingleXXX source code, I found the following RPCs > classified as need actual src and not need actual src. > > *need src path RPC:* > addBlock、abandonBlock、getAdditionalDatanode、complete > *not need src path RPC:* > updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked > by: > getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies) > > After changes, the src can be pass to NN correctly. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.
[ https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653970#comment-17653970 ] ASF GitHub Bot commented on HDFS-16880: --- goiri commented on code in PR #5262: URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060525087 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -687,7 +687,7 @@ T invokeAtAvailableNs(RemoteMethod method, Class clazz) // If default Ns is present return result from that namespace. if (!nsId.isEmpty()) { try { -return rpcClient.invokeSingle(nsId, method, clazz); +return rpcClient.invokeSingle(nsId, method, clazz, ""); Review Comment: Can't we make it an actual field or attribute? > modify invokeSingleXXX interface in order to pass actual file src to namenode > for debug info. > - > > Key: HDFS-16880 > URL: https://issues.apache.org/jira/browse/HDFS-16880 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > Labels: pull-request-available > > We found lots of INFO level log like below: > {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by > DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1 > 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480 > {quote} > It lost the real path of completeFile. Actually this is caused by : > > *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String, > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)* > In this method, it instantiates a RemoteLocationContext object: > *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");* > and then execute: *Object[] params = method.getParams(loc);* > The problem is right here, becasuse we always use new RemoteParam(), so, > context.getDest() always return "/"; That's why we saw lots of incorrect logs. > > After diving into invokeSingleXXX source code, I found the following RPCs > classified as need actual src and not need actual src. > > *need src path RPC:* > addBlock、abandonBlock、getAdditionalDatanode、complete > *not need src path RPC:* > updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked > by: > getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies) > > After changes, the src can be pass to NN correctly. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.
[ https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653928#comment-17653928 ] ASF GitHub Bot commented on HDFS-16880: --- ayushtkn commented on code in PR #5262: URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060442910 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -687,7 +687,7 @@ T invokeAtAvailableNs(RemoteMethod method, Class clazz) // If default Ns is present return result from that namespace. if (!nsId.isEmpty()) { try { -return rpcClient.invokeSingle(nsId, method, clazz); +return rpcClient.invokeSingle(nsId, method, clazz, ""); Review Comment: @goiri will that be ok to have a prefix then path wrt router rather than the actual path wrt namenode > modify invokeSingleXXX interface in order to pass actual file src to namenode > for debug info. > - > > Key: HDFS-16880 > URL: https://issues.apache.org/jira/browse/HDFS-16880 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > Labels: pull-request-available > > We found lots of INFO level log like below: > {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by > DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1 > 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480 > {quote} > It lost the real path of completeFile. Actually this is caused by : > > *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String, > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)* > In this method, it instantiates a RemoteLocationContext object: > *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");* > and then execute: *Object[] params = method.getParams(loc);* > The problem is right here, becasuse we always use new RemoteParam(), so, > context.getDest() always return "/"; That's why we saw lots of incorrect logs. > > After diving into invokeSingleXXX source code, I found the following RPCs > classified as need actual src and not need actual src. > > *need src path RPC:* > addBlock、abandonBlock、getAdditionalDatanode、complete > *not need src path RPC:* > updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked > by: > getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies) > > After changes, the src can be pass to NN correctly. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.
[ https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653912#comment-17653912 ] ASF GitHub Bot commented on HDFS-16880: --- hfutatzhanghb commented on code in PR #5262: URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060435056 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -687,7 +687,7 @@ T invokeAtAvailableNs(RemoteMethod method, Class clazz) // If default Ns is present return result from that namespace. if (!nsId.isEmpty()) { try { -return rpcClient.invokeSingle(nsId, method, clazz); +return rpcClient.invokeSingle(nsId, method, clazz, ""); Review Comment: > Can we add an actual test? OK~ @goiri , I will add actual test soon~ thanks a lot. > modify invokeSingleXXX interface in order to pass actual file src to namenode > for debug info. > - > > Key: HDFS-16880 > URL: https://issues.apache.org/jira/browse/HDFS-16880 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > Labels: pull-request-available > > We found lots of INFO level log like below: > {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by > DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1 > 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480 > {quote} > It lost the real path of completeFile. Actually this is caused by : > > *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String, > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)* > In this method, it instantiates a RemoteLocationContext object: > *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");* > and then execute: *Object[] params = method.getParams(loc);* > The problem is right here, becasuse we always use new RemoteParam(), so, > context.getDest() always return "/"; That's why we saw lots of incorrect logs. > > After diving into invokeSingleXXX source code, I found the following RPCs > classified as need actual src and not need actual src. > > *need src path RPC:* > addBlock、abandonBlock、getAdditionalDatanode、complete > *not need src path RPC:* > updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked > by: > getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies) > > After changes, the src can be pass to NN correctly. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.
[ https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653910#comment-17653910 ] ASF GitHub Bot commented on HDFS-16880: --- hfutatzhanghb commented on code in PR #5262: URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060434510 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -687,7 +687,7 @@ T invokeAtAvailableNs(RemoteMethod method, Class clazz) // If default Ns is present return result from that namespace. if (!nsId.isEmpty()) { try { -return rpcClient.invokeSingle(nsId, method, clazz); +return rpcClient.invokeSingle(nsId, method, clazz, ""); Review Comment: > > Yes,the src is not the dst path, but we can use the src to lookup real dst path in mount table. Here, we can not execute this.subclusterResolver.getMountPoints(path);, because the statement is time-consuming. > > From correctness point of view it isn't correct. If `/mount/path` points to Ns0 `/dir` and we log` /mount/path` in the namenode, it can lead to confusions if the namenode also has a path named `/mount/path`, If you operate on `/dir` via router `/mount/path` and if you directly operate on `/mount/path` on the namenode. > > If it has serious performance penalties we need to make it configurable and then figure out what can be done when the config is disabled. Hi, @ayushtkn , can we prepend some specific prefix, such as "RBF#src_path:" to the dir? In this way, we can distinguish RBF scene from direct namenode scene. What’s your opinion? I am looking forward to your reply and do the next steps. thanks again~ > modify invokeSingleXXX interface in order to pass actual file src to namenode > for debug info. > - > > Key: HDFS-16880 > URL: https://issues.apache.org/jira/browse/HDFS-16880 > Project: Hadoop HDFS > Issue Type: Improvement > Components: rbf >Affects Versions: 3.3.4 >Reporter: ZhangHB >Priority: Major > Labels: pull-request-available > > We found lots of INFO level log like below: > {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by > DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1 > 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480 > {quote} > It lost the real path of completeFile. Actually this is caused by : > > *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String, > org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)* > In this method, it instantiates a RemoteLocationContext object: > *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");* > and then execute: *Object[] params = method.getParams(loc);* > The problem is right here, becasuse we always use new RemoteParam(), so, > context.getDest() always return "/"; That's why we saw lots of incorrect logs. > > After diving into invokeSingleXXX source code, I found the following RPCs > classified as need actual src and not need actual src. > > *need src path RPC:* > addBlock、abandonBlock、getAdditionalDatanode、complete > *not need src path RPC:* > updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked > by: > getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies) > > After changes, the src can be pass to NN correctly. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission
[ https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653876#comment-17653876 ] ASF GitHub Bot commented on HDFS-16881: --- szetszwo opened a new pull request, #5268: URL: https://github.com/apache/hadoop/pull/5268 ### Description of PR AccessControlEnforcer is configurable. If an external AccessControlEnforcer runs for a long time to check permission with the FSnamesystem lock, it will significantly slow down the entire Namenode. In the JIRA, we will print a WARN message when it happens. https://issues.apache.org/jira/browse/HDFS-16881 ### How was this patch tested? New unit tests > Warn if AccessControlEnforcer runs for a long time to check permission > -- > > Key: HDFS-16881 > URL: https://issues.apache.org/jira/browse/HDFS-16881 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > > AccessControlEnforcer is configurable. If an external AccessControlEnforcer > runs for a long time to check permission with the FSnamesystem lock, it will > significantly slow down the entire Namenode. In the JIRA, we will print a > WARN message when it happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission
[ https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16881: -- Labels: pull-request-available (was: ) > Warn if AccessControlEnforcer runs for a long time to check permission > -- > > Key: HDFS-16881 > URL: https://issues.apache.org/jira/browse/HDFS-16881 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Labels: pull-request-available > > AccessControlEnforcer is configurable. If an external AccessControlEnforcer > runs for a long time to check permission with the FSnamesystem lock, it will > significantly slow down the entire Namenode. In the JIRA, we will print a > WARN message when it happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission
[ https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz-wo Sze updated HDFS-16881: -- Summary: Warn if AccessControlEnforcer runs for a long time to check permission (was: Print a warning if AccessControlEnforcer runs for a long time to check permission) > Warn if AccessControlEnforcer runs for a long time to check permission > -- > > Key: HDFS-16881 > URL: https://issues.apache.org/jira/browse/HDFS-16881 > Project: Hadoop HDFS > Issue Type: Bug > Components: namanode >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > > AccessControlEnforcer is configurable. If an external AccessControlEnforcer > runs for a long time to check permission with the FSnamesystem lock, it will > significantly slow down the entire Namenode. In the JIRA, we will print a > WARN message when it happens. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org