[jira] [Commented] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654279#comment-17654279
 ] 

ASF GitHub Bot commented on HDFS-16879:
---

haiyang1987 commented on PR #5264:
URL: https://github.com/apache/hadoop/pull/5264#issuecomment-1370454477

   Thanks @ZanderXu @dineshchitlangia help review and merged.




> EC : Fsck -blockId shows number of redundant internal block replicas for EC 
> Blocks
> --
>
> Key: HDFS-16879
> URL: https://issues.apache.org/jira/browse/HDFS-16879
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> For the block of the ec file run hdfs fsck -blockId xxx  can add shows number 
> of redundant internal block replicas.
> for example: the current blockgroup has 10 live replicas, and it will show 
> there are 9 live replicas. 
> actually, there is a live replica that should be in the redundant state, can 
> add shows "No. of redundant Replica: 1"
> {code:java}
> hdfs fsck -blockId blk_-xxx
> Block Id: blk_-xxx
> Block belongs to: /ec/file1
> No. of Expected Replica: 9
> No. of live Replica: 9
> No. of excess Replica: 0
> No. of stale Replica: 0
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: ip-xxx1 is HEALTHY
> Block replica on datanode/rack: ip-xxx2 is HEALTHY
> Block replica on datanode/rack: ip-xxx3 is HEALTHY
> Block replica on datanode/rack: ip-xxx4 is HEALTHY
> Block replica on datanode/rack: ip-xxx5 is HEALTHY
> Block replica on datanode/rack: ip-xxx6 is HEALTHY
> Block replica on datanode/rack: ip-xxx7 is HEALTHY
> Block replica on datanode/rack: ip-xxx8 is HEALTHY
> Block replica on datanode/rack: ip-xxx9 is HEALTHY
> Block replica on datanode/rack: ip-xxx10 is HEALTHY
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654280#comment-17654280
 ] 

ASF GitHub Bot commented on HDFS-16879:
---

ZanderXu commented on PR #5264:
URL: https://github.com/apache/hadoop/pull/5264#issuecomment-1370454618

   The failed UT `hadoop.hdfs.TestLeaseRecovery2` is traced by HDFS-16853. 




> EC : Fsck -blockId shows number of redundant internal block replicas for EC 
> Blocks
> --
>
> Key: HDFS-16879
> URL: https://issues.apache.org/jira/browse/HDFS-16879
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> For the block of the ec file run hdfs fsck -blockId xxx  can add shows number 
> of redundant internal block replicas.
> for example: the current blockgroup has 10 live replicas, and it will show 
> there are 9 live replicas. 
> actually, there is a live replica that should be in the redundant state, can 
> add shows "No. of redundant Replica: 1"
> {code:java}
> hdfs fsck -blockId blk_-xxx
> Block Id: blk_-xxx
> Block belongs to: /ec/file1
> No. of Expected Replica: 9
> No. of live Replica: 9
> No. of excess Replica: 0
> No. of stale Replica: 0
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: ip-xxx1 is HEALTHY
> Block replica on datanode/rack: ip-xxx2 is HEALTHY
> Block replica on datanode/rack: ip-xxx3 is HEALTHY
> Block replica on datanode/rack: ip-xxx4 is HEALTHY
> Block replica on datanode/rack: ip-xxx5 is HEALTHY
> Block replica on datanode/rack: ip-xxx6 is HEALTHY
> Block replica on datanode/rack: ip-xxx7 is HEALTHY
> Block replica on datanode/rack: ip-xxx8 is HEALTHY
> Block replica on datanode/rack: ip-xxx9 is HEALTHY
> Block replica on datanode/rack: ip-xxx10 is HEALTHY
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks

2023-01-03 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu resolved HDFS-16879.
-
Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> EC : Fsck -blockId shows number of redundant internal block replicas for EC 
> Blocks
> --
>
> Key: HDFS-16879
> URL: https://issues.apache.org/jira/browse/HDFS-16879
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> For the block of the ec file run hdfs fsck -blockId xxx  can add shows number 
> of redundant internal block replicas.
> for example: the current blockgroup has 10 live replicas, and it will show 
> there are 9 live replicas. 
> actually, there is a live replica that should be in the redundant state, can 
> add shows "No. of redundant Replica: 1"
> {code:java}
> hdfs fsck -blockId blk_-xxx
> Block Id: blk_-xxx
> Block belongs to: /ec/file1
> No. of Expected Replica: 9
> No. of live Replica: 9
> No. of excess Replica: 0
> No. of stale Replica: 0
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: ip-xxx1 is HEALTHY
> Block replica on datanode/rack: ip-xxx2 is HEALTHY
> Block replica on datanode/rack: ip-xxx3 is HEALTHY
> Block replica on datanode/rack: ip-xxx4 is HEALTHY
> Block replica on datanode/rack: ip-xxx5 is HEALTHY
> Block replica on datanode/rack: ip-xxx6 is HEALTHY
> Block replica on datanode/rack: ip-xxx7 is HEALTHY
> Block replica on datanode/rack: ip-xxx8 is HEALTHY
> Block replica on datanode/rack: ip-xxx9 is HEALTHY
> Block replica on datanode/rack: ip-xxx10 is HEALTHY
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654277#comment-17654277
 ] 

ASF GitHub Bot commented on HDFS-16879:
---

ZanderXu commented on PR #5264:
URL: https://github.com/apache/hadoop/pull/5264#issuecomment-1370453598

   Merged, thanks @haiyang1987 for your report. Thanks @dineshchitlangia for 
your review.




> EC : Fsck -blockId shows number of redundant internal block replicas for EC 
> Blocks
> --
>
> Key: HDFS-16879
> URL: https://issues.apache.org/jira/browse/HDFS-16879
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> For the block of the ec file run hdfs fsck -blockId xxx  can add shows number 
> of redundant internal block replicas.
> for example: the current blockgroup has 10 live replicas, and it will show 
> there are 9 live replicas. 
> actually, there is a live replica that should be in the redundant state, can 
> add shows "No. of redundant Replica: 1"
> {code:java}
> hdfs fsck -blockId blk_-xxx
> Block Id: blk_-xxx
> Block belongs to: /ec/file1
> No. of Expected Replica: 9
> No. of live Replica: 9
> No. of excess Replica: 0
> No. of stale Replica: 0
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: ip-xxx1 is HEALTHY
> Block replica on datanode/rack: ip-xxx2 is HEALTHY
> Block replica on datanode/rack: ip-xxx3 is HEALTHY
> Block replica on datanode/rack: ip-xxx4 is HEALTHY
> Block replica on datanode/rack: ip-xxx5 is HEALTHY
> Block replica on datanode/rack: ip-xxx6 is HEALTHY
> Block replica on datanode/rack: ip-xxx7 is HEALTHY
> Block replica on datanode/rack: ip-xxx8 is HEALTHY
> Block replica on datanode/rack: ip-xxx9 is HEALTHY
> Block replica on datanode/rack: ip-xxx10 is HEALTHY
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16879) EC : Fsck -blockId shows number of redundant internal block replicas for EC Blocks

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654276#comment-17654276
 ] 

ASF GitHub Bot commented on HDFS-16879:
---

ZanderXu merged PR #5264:
URL: https://github.com/apache/hadoop/pull/5264




> EC : Fsck -blockId shows number of redundant internal block replicas for EC 
> Blocks
> --
>
> Key: HDFS-16879
> URL: https://issues.apache.org/jira/browse/HDFS-16879
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
>
> For the block of the ec file run hdfs fsck -blockId xxx  can add shows number 
> of redundant internal block replicas.
> for example: the current blockgroup has 10 live replicas, and it will show 
> there are 9 live replicas. 
> actually, there is a live replica that should be in the redundant state, can 
> add shows "No. of redundant Replica: 1"
> {code:java}
> hdfs fsck -blockId blk_-xxx
> Block Id: blk_-xxx
> Block belongs to: /ec/file1
> No. of Expected Replica: 9
> No. of live Replica: 9
> No. of excess Replica: 0
> No. of stale Replica: 0
> No. of decommissioned Replica: 0
> No. of decommissioning Replica: 0
> No. of corrupted Replica: 0
> Block replica on datanode/rack: ip-xxx1 is HEALTHY
> Block replica on datanode/rack: ip-xxx2 is HEALTHY
> Block replica on datanode/rack: ip-xxx3 is HEALTHY
> Block replica on datanode/rack: ip-xxx4 is HEALTHY
> Block replica on datanode/rack: ip-xxx5 is HEALTHY
> Block replica on datanode/rack: ip-xxx6 is HEALTHY
> Block replica on datanode/rack: ip-xxx7 is HEALTHY
> Block replica on datanode/rack: ip-xxx8 is HEALTHY
> Block replica on datanode/rack: ip-xxx9 is HEALTHY
> Block replica on datanode/rack: ip-xxx10 is HEALTHY
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654257#comment-17654257
 ] 

ASF GitHub Bot commented on HDFS-16880:
---

hfutatzhanghb commented on code in PR #5262:
URL: https://github.com/apache/hadoop/pull/5262#discussion_r1061099106


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -687,7 +687,7 @@  T invokeAtAvailableNs(RemoteMethod method, Class 
clazz)
 // If default Ns is present return result from that namespace.
 if (!nsId.isEmpty()) {
   try {
-return rpcClient.invokeSingle(nsId, method, clazz);
+return rpcClient.invokeSingle(nsId, method, clazz, "");

Review Comment:
   Hi, @ayushtkn @goiri . Could you please take a look at 
https://issues.apache.org/jira/browse/HDFS-16882
   is it feasible? thanks a lot.





> modify invokeSingleXXX interface in order to pass actual file src to namenode 
> for debug info.
> -
>
> Key: HDFS-16880
> URL: https://issues.apache.org/jira/browse/HDFS-16880
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> We found lots of INFO level log like below:
> {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by 
> DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1
> 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480
> {quote}
> It lost the real path of completeFile. Actually this is caused by : 
>  
> *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String,
>  org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)*
> In this method, it instantiates a RemoteLocationContext object:
> *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");*
> and then execute: *Object[] params = method.getParams(loc);*
> The problem is right here, becasuse we always use new RemoteParam(), so, 
> context.getDest() always return "/"; That's why we saw lots of incorrect logs.
>  
> After diving into invokeSingleXXX source code, I found the following RPCs 
> classified as need actual src and not need actual src.
>  
> *need src path RPC:*
> addBlock、abandonBlock、getAdditionalDatanode、complete
> *not need src path RPC:*
> updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked
>  by: 
> getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies)
>  
> After changes, the src can be pass to NN correctly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16882) Add cache hit rate metric in MountTableResolver#getDestinationForPath

2023-01-03 Thread ZhangHB (Jira)
ZhangHB created HDFS-16882:
--

 Summary: Add cache hit rate metric in 
MountTableResolver#getDestinationForPath
 Key: HDFS-16882
 URL: https://issues.apache.org/jira/browse/HDFS-16882
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Affects Versions: 3.3.4
Reporter: ZhangHB


Currently,  the default value of 
"dfs.federation.router.mount-table.cache.enable" is ture,

the default value of "dfs.federation.router.mount-table.max-cache-size" is 
1.

But there is no metric that display cache hit rate, I think we can add a hit 
rate metric to watch the Cache performance and better tuning the parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654231#comment-17654231
 ] 

ASF GitHub Bot commented on HDFS-16881:
---

szetszwo commented on code in PR #5268:
URL: https://github.com/apache/hadoop/pull/5268#discussion_r1061074364


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java:
##
@@ -388,6 +392,12 @@ public enum DirOp {
 DFS_PROTECTED_SUBDIRECTORIES_ENABLE,
 DFS_PROTECTED_SUBDIRECTORIES_ENABLE_DEFAULT);
 
+final long readLockThresholdMs = conf.getLong(
+DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_KEY,
+DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_DEFAULT);
+// use half of read lock threshold

Review Comment:
   Indeed, let's make it configurable.



##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSPermissionChecker.java:
##
@@ -446,4 +447,20 @@ private static INodeFile createINodeFile(INodeDirectory 
parent, String name,
 parent.addChild(inodeFile);
 return inodeFile;
   }
+
+  @Test
+  public void testCheckAccessControlEnforcerSlowness() throws Exception {
+final long thresholdMs = 10;
+final String message = FSPermissionChecker.runCheckPermission(() -> {
+  try {
+Thread.sleep(20);
+  } catch (InterruptedException e) {
+throw new RuntimeException(e);

Review Comment:
   Sure. Thanks for pointing out the JUnit behavior.





> Warn if AccessControlEnforcer runs for a long time to check permission
> --
>
> Key: HDFS-16881
> URL: https://issues.apache.org/jira/browse/HDFS-16881
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>
> AccessControlEnforcer is configurable.  If an external AccessControlEnforcer 
> runs for a long time to check permission with the FSnamesystem lock, it will 
> significantly slow down the entire Namenode.  In the JIRA, we will print a 
> WARN message when it happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654210#comment-17654210
 ] 

ASF GitHub Bot commented on HDFS-16872:
---

hadoop-yetus commented on PR #5269:
URL: https://github.com/apache/hadoop/pull/5269#issuecomment-1370338282

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 38s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  41m 51s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |  24m 35s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |  22m 11s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  5s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | -1 :x: |  javadoc  |   1m 11s | 
[/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5269/1/artifact/out/branch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt)
 |  hadoop-common in trunk failed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 45s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 40s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m  0s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  24m 32s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |  24m 32s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |  20m 58s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |  20m 58s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 58s | 
[/results-checkstyle-hadoop-common-project_hadoop-common.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5269/1/artifact/out/results-checkstyle-hadoop-common-project_hadoop-common.txt)
 |  hadoop-common-project/hadoop-common: The patch generated 1 new + 0 
unchanged - 0 fixed = 1 total (was 0)  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  the patch passed  |
   | -1 :x: |  javadoc  |   1m  5s | 
[/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5269/1/artifact/out/patch-javadoc-hadoop-common-project_hadoop-common-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt)
 |  hadoop-common in the patch failed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.  |
   | +1 :green_heart: |  javadoc  |   0m 47s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   2m 40s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  23m 30s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  18m 14s |  |  hadoop-common in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m  3s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 216m 29s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5269/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5269 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux e4d5f5b01f11 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build

[jira] [Commented] (HDFS-16875) Erasure Coding: data access proxy to allow old clients to read EC data

2023-01-03 Thread Jing Zhao (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654209#comment-17654209
 ] 

Jing Zhao commented on HDFS-16875:
--

Posted the design doc for the EC access proxy.

> Erasure Coding: data access proxy to allow old clients to read EC data
> --
>
> Key: HDFS-16875
> URL: https://issues.apache.org/jira/browse/HDFS-16875
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ec, erasure-coding
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Major
> Attachments: Erasure Coding Access Proxy.pdf
>
>
> Erasure Coding is only supported by Hadoop 3, while many production 
> deployments still depend on Hadoop 2. Upgrading the whole data tech stack to 
> the Hadoop 3 release may involve big migration efforts and even reliability 
> risks, considering the incompatibilities between these two Hadoop major 
> releases as well as the potential uncovered issues and risks hidden in newer 
> releases. Therefore, we need to find a solution, with the least amount of 
> migration effort and risk, to adopt Erasure Coding for cost efficiency but 
> still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in 
> a transparent manner.
> Internally we have developed an EC access proxy which translates the EC data 
> for old clients. We also extend the NameNode RPC so it can recognize HDFS 
> clients with/without the EC support, and redirect the old clients to the 
> proxy. With the proxy we set up separate Erasure Coding clusters storing 
> hundreds of PB of data, while leaving other production clusters and all the 
> upper layer applications untouched.
> Considering some changes are made at fundamental components of HDFS (e.g., 
> client-NN RPC header), we do not aim to merge the change to trunk. We will 
> use this ticket to share the design and implementation details (including the 
> code) and collect feedback. We may use a separate github repo to open source 
> the implementation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16875) Erasure Coding: data access proxy to allow old clients to read EC data

2023-01-03 Thread Jing Zhao (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhao updated HDFS-16875:
-
Attachment: Erasure Coding Access Proxy.pdf

> Erasure Coding: data access proxy to allow old clients to read EC data
> --
>
> Key: HDFS-16875
> URL: https://issues.apache.org/jira/browse/HDFS-16875
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: ec, erasure-coding
>Reporter: Jing Zhao
>Assignee: Jing Zhao
>Priority: Major
> Attachments: Erasure Coding Access Proxy.pdf
>
>
> Erasure Coding is only supported by Hadoop 3, while many production 
> deployments still depend on Hadoop 2. Upgrading the whole data tech stack to 
> the Hadoop 3 release may involve big migration efforts and even reliability 
> risks, considering the incompatibilities between these two Hadoop major 
> releases as well as the potential uncovered issues and risks hidden in newer 
> releases. Therefore, we need to find a solution, with the least amount of 
> migration effort and risk, to adopt Erasure Coding for cost efficiency but 
> still allow HDFS clients with old versions (Hadoop 2.x) to access EC data in 
> a transparent manner.
> Internally we have developed an EC access proxy which translates the EC data 
> for old clients. We also extend the NameNode RPC so it can recognize HDFS 
> clients with/without the EC support, and redirect the old clients to the 
> proxy. With the proxy we set up separate Erasure Coding clusters storing 
> hundreds of PB of data, while leaving other production clusters and all the 
> upper layer applications untouched.
> Considering some changes are made at fundamental components of HDFS (e.g., 
> client-NN RPC header), we do not aim to merge the change to trunk. We will 
> use this ticket to share the design and implementation details (including the 
> code) and collect feedback. We may use a separate github repo to open source 
> the implementation later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654177#comment-17654177
 ] 

ASF GitHub Bot commented on HDFS-16872:
---

xkrogen commented on PR #5246:
URL: https://github.com/apache/hadoop/pull/5246#issuecomment-1370183767

   I spent a little time poking at `LogThrottlingHelper` to see what it would 
take to make it concurrent. I have a demonstration PR here: #5269. It's a bit 
complicated as it relies on functionality from `ConcurrentHashMap` as well as 
some atomic operations (mostly CAS). I suspect it's likely overkill from an 
implementation complexity perspective since we don't _expect_ high contention 
on this class. I would say it probably makes sense to just use `synchronized`. 
WDYT?




> Fix log throttling by declaring LogThrottlingHelper as static members
> -
>
> Key: HDFS-16872
> URL: https://issues.apache.org/jira/browse/HDFS-16872
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.4
>Reporter: Chengbing Liu
>Priority: Major
>  Labels: pull-request-available
>
> In our production cluster with Observer NameNode enabled, we have plenty of 
> logs printed by {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. The 
> {{LogThrottlingHelper}} doesn't seem to work.
> {noformat}
> 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Start loading edits file ByteStringEditLog[17686250688, 17686250688], 
> ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, 
> 17686250688] maxTxnsToRead = 9223372036854775807
> 2022-10-25 09:26:50,380 INFO 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
> Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688], 
> ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, 
> 17686250688]' to transaction ID 17686250688
> 2022-10-25 09:26:50,380 INFO 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
> Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688]' to 
> transaction ID 17686250688
> 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250688, 
> 17686250688], ByteStringEditLog[17686250688, 17686250688], 
> ByteStringEditLog[17686250688, 17686250688]) of total size 527.0, total edits 
> 1.0, total load time 0.0 ms
> 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Start loading edits file ByteStringEditLog[17686250689, 17686250693], 
> ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, 
> 17686250693] maxTxnsToRead = 9223372036854775807
> 2022-10-25 09:26:50,387 INFO 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
> Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693], 
> ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, 
> 17686250693]' to transaction ID 17686250689
> 2022-10-25 09:26:50,387 INFO 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
> Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693]' to 
> transaction ID 17686250689
> 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250689, 
> 17686250693], ByteStringEditLog[17686250689, 17686250693], 
> ByteStringEditLog[17686250689, 17686250693]) of total size 890.0, total edits 
> 5.0, total load time 1.0 ms
> {noformat}
> After some digging, I found the cause is that {{LogThrottlingHelper}}'s are 
> declared as instance variables of all the enclosing classes, including 
> {{FSImage}}, {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. 
> Therefore the logging frequency will not be limited across different 
> instances. For classes with only limited number of instances, such as 
> {{FSImage}}, this is fine. For others whose instances are created frequently, 
> such as {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}, it will 
> result in plenty of logs.
> This can be fixed by declaring {{LogThrottlingHelper}}'s as static members.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16872) Fix log throttling by declaring LogThrottlingHelper as static members

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654176#comment-17654176
 ] 

ASF GitHub Bot commented on HDFS-16872:
---

xkrogen opened a new pull request, #5269:
URL: https://github.com/apache/hadoop/pull/5269

   WIP PR for demonstration purposes only while working through #5246
   




> Fix log throttling by declaring LogThrottlingHelper as static members
> -
>
> Key: HDFS-16872
> URL: https://issues.apache.org/jira/browse/HDFS-16872
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.3.4
>Reporter: Chengbing Liu
>Priority: Major
>  Labels: pull-request-available
>
> In our production cluster with Observer NameNode enabled, we have plenty of 
> logs printed by {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. The 
> {{LogThrottlingHelper}} doesn't seem to work.
> {noformat}
> 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Start loading edits file ByteStringEditLog[17686250688, 17686250688], 
> ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, 
> 17686250688] maxTxnsToRead = 9223372036854775807
> 2022-10-25 09:26:50,380 INFO 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
> Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688], 
> ByteStringEditLog[17686250688, 17686250688], ByteStringEditLog[17686250688, 
> 17686250688]' to transaction ID 17686250688
> 2022-10-25 09:26:50,380 INFO 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
> Fast-forwarding stream 'ByteStringEditLog[17686250688, 17686250688]' to 
> transaction ID 17686250688
> 2022-10-25 09:26:50,380 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250688, 
> 17686250688], ByteStringEditLog[17686250688, 17686250688], 
> ByteStringEditLog[17686250688, 17686250688]) of total size 527.0, total edits 
> 1.0, total load time 0.0 ms
> 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Start loading edits file ByteStringEditLog[17686250689, 17686250693], 
> ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, 
> 17686250693] maxTxnsToRead = 9223372036854775807
> 2022-10-25 09:26:50,387 INFO 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
> Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693], 
> ByteStringEditLog[17686250689, 17686250693], ByteStringEditLog[17686250689, 
> 17686250693]' to transaction ID 17686250689
> 2022-10-25 09:26:50,387 INFO 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream: 
> Fast-forwarding stream 'ByteStringEditLog[17686250689, 17686250693]' to 
> transaction ID 17686250689
> 2022-10-25 09:26:50,387 INFO org.apache.hadoop.hdfs.server.namenode.FSImage: 
> Loaded 1 edits file(s) (the last named ByteStringEditLog[17686250689, 
> 17686250693], ByteStringEditLog[17686250689, 17686250693], 
> ByteStringEditLog[17686250689, 17686250693]) of total size 890.0, total edits 
> 5.0, total load time 1.0 ms
> {noformat}
> After some digging, I found the cause is that {{LogThrottlingHelper}}'s are 
> declared as instance variables of all the enclosing classes, including 
> {{FSImage}}, {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}. 
> Therefore the logging frequency will not be limited across different 
> instances. For classes with only limited number of instances, such as 
> {{FSImage}}, this is fine. For others whose instances are created frequently, 
> such as {{FSEditLogLoader}} and {{RedundantEditLogInputStream}}, it will 
> result in plenty of logs.
> This can be fixed by declaring {{LogThrottlingHelper}}'s as static members.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16877) Namenode doesn't use alignment context in TestObserverWithRouter

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654160#comment-17654160
 ] 

ASF GitHub Bot commented on HDFS-16877:
---

hadoop-yetus commented on PR #5257:
URL: https://github.com/apache/hadoop/pull/5257#issuecomment-1370149310

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |  12m 24s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 41s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   0m 45s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   0m 41s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   0m 35s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   0m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   0m 52s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 30s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  20m 37s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 36s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javac  |   0m 36s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  javac  |   0m 32s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 20s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 35s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 33s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 36s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  22m 17s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 39s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 128m  7s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5257/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/5257 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 691427fdab40 4.15.0-200-generic #211-Ubuntu SMP Thu Nov 24 
18:16:04 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 4d2a1ca6903f3b611c9f10727cce970168f727de |
   | Default Java | Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   | Multi-JDK versions | 
/usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 
/usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_352-8u352-ga-1~20.04-b08 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5257/5/testReport/ |
   | Max. process+thread count | 2230 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5257/5/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   




> Namenode doesn't use alignment context in TestObserverWithRoute

[jira] [Commented] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654142#comment-17654142
 ] 

ASF GitHub Bot commented on HDFS-16881:
---

cnauroth commented on code in PR #5268:
URL: https://github.com/apache/hadoop/pull/5268#discussion_r1060859919


##
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/FSDirectory.java:
##
@@ -388,6 +392,12 @@ public enum DirOp {
 DFS_PROTECTED_SUBDIRECTORIES_ENABLE,
 DFS_PROTECTED_SUBDIRECTORIES_ENABLE_DEFAULT);
 
+final long readLockThresholdMs = conf.getLong(
+DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_KEY,
+DFS_NAMENODE_READ_LOCK_REPORTING_THRESHOLD_MS_DEFAULT);
+// use half of read lock threshold

Review Comment:
   Is it necessary for this to use half? If so, can you describe why in this 
comment?



##
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSPermissionChecker.java:
##
@@ -446,4 +447,20 @@ private static INodeFile createINodeFile(INodeDirectory 
parent, String name,
 parent.addChild(inodeFile);
 return inodeFile;
   }
+
+  @Test
+  public void testCheckAccessControlEnforcerSlowness() throws Exception {
+final long thresholdMs = 10;
+final String message = FSPermissionChecker.runCheckPermission(() -> {
+  try {
+Thread.sleep(20);
+  } catch (InterruptedException e) {
+throw new RuntimeException(e);

Review Comment:
   I suggest adding `Thread.currentThread().interrupt();` before throwing. It 
shouldn't matter much in practice, but JUnit runner threads have some strange 
behavior when interrupted status is not restored as expected.





> Warn if AccessControlEnforcer runs for a long time to check permission
> --
>
> Key: HDFS-16881
> URL: https://issues.apache.org/jira/browse/HDFS-16881
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>
> AccessControlEnforcer is configurable.  If an external AccessControlEnforcer 
> runs for a long time to check permission with the FSnamesystem lock, it will 
> significantly slow down the entire Namenode.  In the JIRA, we will print a 
> WARN message when it happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16877) Namenode doesn't use alignment context in TestObserverWithRouter

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654112#comment-17654112
 ] 

ASF GitHub Bot commented on HDFS-16877:
---

simbadzina commented on code in PR #5257:
URL: https://github.com/apache/hadoop/pull/5257#discussion_r1060799582


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -510,6 +513,15 @@ BalanceProcedureScheduler getFedRenameScheduler() {
 return this.fedRenameScheduler;
   }
 
+  /**
+   * Get the routerStateIdContext used by this server.
+   * @return routerStateIdContext
+   */
+  @VisibleForTesting
+  public RouterStateIdContext getRouterStateIdContext() {

Review Comment:
   Protected works too. If we need to increase visibility later, we can do so.





> Namenode doesn't use alignment context in TestObserverWithRouter
> 
>
> Key: HDFS-16877
> URL: https://issues.apache.org/jira/browse/HDFS-16877
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, rbf
>Reporter: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
>
> We need to set "{*}dfs.namenode.state.context.enabled{*}" to true for the 
> namenode to send it's stateId in client responses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16877) Namenode doesn't use alignment context in TestObserverWithRouter

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654111#comment-17654111
 ] 

ASF GitHub Bot commented on HDFS-16877:
---

simbadzina commented on code in PR #5257:
URL: https://github.com/apache/hadoop/pull/5257#discussion_r1060799003


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -203,6 +203,9 @@ public class RouterRpcServer extends AbstractService 
implements ClientProtocol,
   /** Router using this RPC server. */
   private final Router router;
 
+  /** RouterStateIdContext for this RPC server. */

Review Comment:
   Fixed.





> Namenode doesn't use alignment context in TestObserverWithRouter
> 
>
> Key: HDFS-16877
> URL: https://issues.apache.org/jira/browse/HDFS-16877
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, rbf
>Reporter: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
>
> We need to set "{*}dfs.namenode.state.context.enabled{*}" to true for the 
> namenode to send it's stateId in client responses.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17654110#comment-17654110
 ] 

ASF GitHub Bot commented on HDFS-16881:
---

hadoop-yetus commented on PR #5268:
URL: https://github.com/apache/hadoop/pull/5268#issuecomment-1370033666

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 31s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  41m 24s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 30s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 29s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  trunk passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 33s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  25m 40s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 25s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | -1 :x: |  javac  |   1m 25s | 
[/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5268/1/artifact/out/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04.txt)
 |  
hadoop-hdfs-project_hadoop-hdfs-jdkUbuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 
with JDK Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04 generated 2 new + 910 
unchanged - 0 fixed = 912 total (was 910)  |
   | +1 :green_heart: |  compile  |   1m 13s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | -1 :x: |  javac  |   1m 13s | 
[/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_352-8u352-ga-1~20.04-b08.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5268/1/artifact/out/results-compile-javac-hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_352-8u352-ga-1~20.04-b08.txt)
 |  
hadoop-hdfs-project_hadoop-hdfs-jdkPrivateBuild-1.8.0_352-8u352-ga-1~20.04-b08 
with JDK Private Build-1.8.0_352-8u352-ga-1~20.04-b08 generated 2 new + 889 
unchanged - 0 fixed = 891 total (was 889)  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   0m 53s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5268/1/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 3 new + 163 unchanged 
- 0 fixed = 166 total (was 163)  |
   | +1 :green_heart: |  mvnsite  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 51s |  |  the patch passed with JDK 
Ubuntu-11.0.17+8-post-Ubuntu-1ubuntu220.04  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  the patch passed with JDK 
Private Build-1.8.0_352-8u352-ga-1~20.04-b08  |
   | +1 :green_heart: |  spotbugs  |   3m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  28m 13s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 392m 42s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-5268/1/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   0m 55s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 510m 14s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.TestLeaseRecovery2 |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   |

[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653977#comment-17653977
 ] 

ASF GitHub Bot commented on HDFS-16880:
---

hfutatzhanghb commented on code in PR #5262:
URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060537469


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -687,7 +687,7 @@  T invokeAtAvailableNs(RemoteMethod method, Class 
clazz)
 // If default Ns is present return result from that namespace.
 if (!nsId.isEmpty()) {
   try {
-return rpcClient.invokeSingle(nsId, method, clazz);
+return rpcClient.invokeSingle(nsId, method, clazz, "");

Review Comment:
   Hi,@goiri ,  could you please also review the code of 
https://issues.apache.org/jira/browse/HDFS-16865





> modify invokeSingleXXX interface in order to pass actual file src to namenode 
> for debug info.
> -
>
> Key: HDFS-16880
> URL: https://issues.apache.org/jira/browse/HDFS-16880
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> We found lots of INFO level log like below:
> {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by 
> DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1
> 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480
> {quote}
> It lost the real path of completeFile. Actually this is caused by : 
>  
> *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String,
>  org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)*
> In this method, it instantiates a RemoteLocationContext object:
> *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");*
> and then execute: *Object[] params = method.getParams(loc);*
> The problem is right here, becasuse we always use new RemoteParam(), so, 
> context.getDest() always return "/"; That's why we saw lots of incorrect logs.
>  
> After diving into invokeSingleXXX source code, I found the following RPCs 
> classified as need actual src and not need actual src.
>  
> *need src path RPC:*
> addBlock、abandonBlock、getAdditionalDatanode、complete
> *not need src path RPC:*
> updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked
>  by: 
> getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies)
>  
> After changes, the src can be pass to NN correctly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653976#comment-17653976
 ] 

ASF GitHub Bot commented on HDFS-16880:
---

hfutatzhanghb commented on code in PR #5262:
URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060536835


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -687,7 +687,7 @@  T invokeAtAvailableNs(RemoteMethod method, Class 
clazz)
 // If default Ns is present return result from that namespace.
 if (!nsId.isEmpty()) {
   try {
-return rpcClient.invokeSingle(nsId, method, clazz);
+return rpcClient.invokeSingle(nsId, method, clazz, "");

Review Comment:
   Hi, @goiri , here if we make it an actual field or attribute, It would 
invoke `getLocationsForPath` to get actual path.
   Then,  It violates the purpose that we use previous block to improve some 
RPC performance.





> modify invokeSingleXXX interface in order to pass actual file src to namenode 
> for debug info.
> -
>
> Key: HDFS-16880
> URL: https://issues.apache.org/jira/browse/HDFS-16880
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> We found lots of INFO level log like below:
> {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by 
> DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1
> 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480
> {quote}
> It lost the real path of completeFile. Actually this is caused by : 
>  
> *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String,
>  org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)*
> In this method, it instantiates a RemoteLocationContext object:
> *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");*
> and then execute: *Object[] params = method.getParams(loc);*
> The problem is right here, becasuse we always use new RemoteParam(), so, 
> context.getDest() always return "/"; That's why we saw lots of incorrect logs.
>  
> After diving into invokeSingleXXX source code, I found the following RPCs 
> classified as need actual src and not need actual src.
>  
> *need src path RPC:*
> addBlock、abandonBlock、getAdditionalDatanode、complete
> *not need src path RPC:*
> updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked
>  by: 
> getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies)
>  
> After changes, the src can be pass to NN correctly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653970#comment-17653970
 ] 

ASF GitHub Bot commented on HDFS-16880:
---

goiri commented on code in PR #5262:
URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060525087


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -687,7 +687,7 @@  T invokeAtAvailableNs(RemoteMethod method, Class 
clazz)
 // If default Ns is present return result from that namespace.
 if (!nsId.isEmpty()) {
   try {
-return rpcClient.invokeSingle(nsId, method, clazz);
+return rpcClient.invokeSingle(nsId, method, clazz, "");

Review Comment:
   Can't we make it an actual field or attribute? 





> modify invokeSingleXXX interface in order to pass actual file src to namenode 
> for debug info.
> -
>
> Key: HDFS-16880
> URL: https://issues.apache.org/jira/browse/HDFS-16880
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> We found lots of INFO level log like below:
> {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by 
> DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1
> 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480
> {quote}
> It lost the real path of completeFile. Actually this is caused by : 
>  
> *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String,
>  org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)*
> In this method, it instantiates a RemoteLocationContext object:
> *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");*
> and then execute: *Object[] params = method.getParams(loc);*
> The problem is right here, becasuse we always use new RemoteParam(), so, 
> context.getDest() always return "/"; That's why we saw lots of incorrect logs.
>  
> After diving into invokeSingleXXX source code, I found the following RPCs 
> classified as need actual src and not need actual src.
>  
> *need src path RPC:*
> addBlock、abandonBlock、getAdditionalDatanode、complete
> *not need src path RPC:*
> updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked
>  by: 
> getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies)
>  
> After changes, the src can be pass to NN correctly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653928#comment-17653928
 ] 

ASF GitHub Bot commented on HDFS-16880:
---

ayushtkn commented on code in PR #5262:
URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060442910


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -687,7 +687,7 @@  T invokeAtAvailableNs(RemoteMethod method, Class 
clazz)
 // If default Ns is present return result from that namespace.
 if (!nsId.isEmpty()) {
   try {
-return rpcClient.invokeSingle(nsId, method, clazz);
+return rpcClient.invokeSingle(nsId, method, clazz, "");

Review Comment:
   @goiri will that be ok to have a prefix then path wrt router rather than the 
actual path wrt namenode





> modify invokeSingleXXX interface in order to pass actual file src to namenode 
> for debug info.
> -
>
> Key: HDFS-16880
> URL: https://issues.apache.org/jira/browse/HDFS-16880
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> We found lots of INFO level log like below:
> {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by 
> DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1
> 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480
> {quote}
> It lost the real path of completeFile. Actually this is caused by : 
>  
> *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String,
>  org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)*
> In this method, it instantiates a RemoteLocationContext object:
> *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");*
> and then execute: *Object[] params = method.getParams(loc);*
> The problem is right here, becasuse we always use new RemoteParam(), so, 
> context.getDest() always return "/"; That's why we saw lots of incorrect logs.
>  
> After diving into invokeSingleXXX source code, I found the following RPCs 
> classified as need actual src and not need actual src.
>  
> *need src path RPC:*
> addBlock、abandonBlock、getAdditionalDatanode、complete
> *not need src path RPC:*
> updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked
>  by: 
> getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies)
>  
> After changes, the src can be pass to NN correctly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653912#comment-17653912
 ] 

ASF GitHub Bot commented on HDFS-16880:
---

hfutatzhanghb commented on code in PR #5262:
URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060435056


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -687,7 +687,7 @@  T invokeAtAvailableNs(RemoteMethod method, Class 
clazz)
 // If default Ns is present return result from that namespace.
 if (!nsId.isEmpty()) {
   try {
-return rpcClient.invokeSingle(nsId, method, clazz);
+return rpcClient.invokeSingle(nsId, method, clazz, "");

Review Comment:
   > Can we add an actual test?
   
   OK~ @goiri , I will add actual test soon~ thanks a lot.





> modify invokeSingleXXX interface in order to pass actual file src to namenode 
> for debug info.
> -
>
> Key: HDFS-16880
> URL: https://issues.apache.org/jira/browse/HDFS-16880
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> We found lots of INFO level log like below:
> {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by 
> DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1
> 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480
> {quote}
> It lost the real path of completeFile. Actually this is caused by : 
>  
> *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String,
>  org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)*
> In this method, it instantiates a RemoteLocationContext object:
> *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");*
> and then execute: *Object[] params = method.getParams(loc);*
> The problem is right here, becasuse we always use new RemoteParam(), so, 
> context.getDest() always return "/"; That's why we saw lots of incorrect logs.
>  
> After diving into invokeSingleXXX source code, I found the following RPCs 
> classified as need actual src and not need actual src.
>  
> *need src path RPC:*
> addBlock、abandonBlock、getAdditionalDatanode、complete
> *not need src path RPC:*
> updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked
>  by: 
> getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies)
>  
> After changes, the src can be pass to NN correctly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16880) modify invokeSingleXXX interface in order to pass actual file src to namenode for debug info.

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653910#comment-17653910
 ] 

ASF GitHub Bot commented on HDFS-16880:
---

hfutatzhanghb commented on code in PR #5262:
URL: https://github.com/apache/hadoop/pull/5262#discussion_r1060434510


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -687,7 +687,7 @@  T invokeAtAvailableNs(RemoteMethod method, Class 
clazz)
 // If default Ns is present return result from that namespace.
 if (!nsId.isEmpty()) {
   try {
-return rpcClient.invokeSingle(nsId, method, clazz);
+return rpcClient.invokeSingle(nsId, method, clazz, "");

Review Comment:
   > > Yes,the src is not the dst path, but we can use the src to lookup real 
dst path in mount table. Here, we can not execute 
this.subclusterResolver.getMountPoints(path);, because the statement is 
time-consuming.
   > 
   > From correctness point of view it isn't correct. If `/mount/path` points 
to Ns0 `/dir` and we log` /mount/path` in the namenode, it can lead to 
confusions if the namenode also has a path named `/mount/path`, If you operate 
on `/dir` via router `/mount/path` and if you directly operate on `/mount/path` 
on the namenode.
   > 
   > If it has serious performance penalties we need to make it configurable 
and then figure out what can be done when the config is disabled.
   
   Hi, @ayushtkn , can we prepend some specific prefix, such as "RBF#src_path:" 
to the dir?  In this way, we can distinguish RBF scene from direct namenode 
scene. What’s your opinion?  I am looking forward to your reply and do the next 
steps. thanks again~





> modify invokeSingleXXX interface in order to pass actual file src to namenode 
> for debug info.
> -
>
> Key: HDFS-16880
> URL: https://issues.apache.org/jira/browse/HDFS-16880
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: rbf
>Affects Versions: 3.3.4
>Reporter: ZhangHB
>Priority: Major
>  Labels: pull-request-available
>
> We found lots of INFO level log like below:
> {quote}2022-12-30 15:31:04,169 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by 
> DFSClient_attempt_1671783180362_213003_m_77_0_1102875551_1
> 2022-12-30 15:31:04,186 INFO org.apache.hadoop.hdfs.StateChange: DIR* 
> completeFile: / is closed by DFSClient_NONMAPREDUCE_1198313144_27480
> {quote}
> It lost the real path of completeFile. Actually this is caused by : 
>  
> *org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient#invokeSingle(java.lang.String,
>  org.apache.hadoop.hdfs.server.federation.router.RemoteMethod)*
> In this method, it instantiates a RemoteLocationContext object:
> *RemoteLocationContext loc = new RemoteLocation(nsId, "/", "/");*
> and then execute: *Object[] params = method.getParams(loc);*
> The problem is right here, becasuse we always use new RemoteParam(), so, 
> context.getDest() always return "/"; That's why we saw lots of incorrect logs.
>  
> After diving into invokeSingleXXX source code, I found the following RPCs 
> classified as need actual src and not need actual src.
>  
> *need src path RPC:*
> addBlock、abandonBlock、getAdditionalDatanode、complete
> *not need src path RPC:*
> updateBlockForPipeline、reportBadBlocks、getBlocks、updatePipeline、invokeAtAvailableNs(invoked
>  by: 
> getServerDefaults、getBlockKeys、getTransactionID、getMostRecentCheckpointTxId、versionRequest、getStoragePolicies)
>  
> After changes, the src can be pass to NN correctly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission

2023-01-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17653876#comment-17653876
 ] 

ASF GitHub Bot commented on HDFS-16881:
---

szetszwo opened a new pull request, #5268:
URL: https://github.com/apache/hadoop/pull/5268

   ### Description of PR
   
   AccessControlEnforcer is configurable. If an external AccessControlEnforcer 
runs for a long time to check permission with the FSnamesystem lock, it will 
significantly slow down the entire Namenode. In the JIRA, we will print a WARN 
message when it happens.
   
   https://issues.apache.org/jira/browse/HDFS-16881
   
   ### How was this patch tested?
   
   New unit tests




> Warn if AccessControlEnforcer runs for a long time to check permission
> --
>
> Key: HDFS-16881
> URL: https://issues.apache.org/jira/browse/HDFS-16881
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>
> AccessControlEnforcer is configurable.  If an external AccessControlEnforcer 
> runs for a long time to check permission with the FSnamesystem lock, it will 
> significantly slow down the entire Namenode.  In the JIRA, we will print a 
> WARN message when it happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission

2023-01-03 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16881:
--
Labels: pull-request-available  (was: )

> Warn if AccessControlEnforcer runs for a long time to check permission
> --
>
> Key: HDFS-16881
> URL: https://issues.apache.org/jira/browse/HDFS-16881
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
>
> AccessControlEnforcer is configurable.  If an external AccessControlEnforcer 
> runs for a long time to check permission with the FSnamesystem lock, it will 
> significantly slow down the entire Namenode.  In the JIRA, we will print a 
> WARN message when it happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16881) Warn if AccessControlEnforcer runs for a long time to check permission

2023-01-03 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze updated HDFS-16881:
--
Summary: Warn if AccessControlEnforcer runs for a long time to check 
permission  (was: Print a warning if AccessControlEnforcer runs for a long time 
to check permission)

> Warn if AccessControlEnforcer runs for a long time to check permission
> --
>
> Key: HDFS-16881
> URL: https://issues.apache.org/jira/browse/HDFS-16881
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: namanode
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>
> AccessControlEnforcer is configurable.  If an external AccessControlEnforcer 
> runs for a long time to check permission with the FSnamesystem lock, it will 
> significantly slow down the entire Namenode.  In the JIRA, we will print a 
> WARN message when it happens.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org