[jira] [Updated] (HDFS-16726) There is a memory-related problem about HDFS namenode

2022-08-11 Thread yuyanlei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuyanlei updated HDFS-16726:

Attachment: (was: 图片_lanxin_20220809153722.png)

> There is a memory-related problem about HDFS namenode
> -
>
> Key: HDFS-16726
> URL: https://issues.apache.org/jira/browse/HDFS-16726
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Affects Versions: 2.7.2
> Environment: -Xms280G -Xmx280G -XX:MaxDirectMemorySize=10G 
> -XX:MetaspaceSize=128M -server \
>                     -XX:+UseG1GC -XX:+UseStringDeduplication 
> -XX:MaxGCPauseMillis=250 -XX:+UnlockExperimentalVMOptions 
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 \
>                     -XX:G1OldCSetRegionThresholdPercent=1 
> -XX:G1MixedGCCountTarget=9 -XX:+SafepointTimeout  
> -XX:SafepointTimeoutDelay=4000 \
>                     -XX:ParallelGCThreads=24 -XX:ConcGCThreads=6 
> -XX:G1RSetRegionEntries=4096 -XX:+AggressiveOpts -XX:+DisableExplicitGC \
>                     -XX:G1HeapWastePercent=9 
> -XX:G1MixedGCLiveThresholdPercent=85 -XX:InitiatingHeapOccupancyPercent=75 \
>                     -XX:+ParallelRefProcEnabled -XX:-ResizePLAB  
> -XX:+PrintAdaptiveSizePolicy \
>                     -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps \
>                     -Xloggc:$HADOOP_LOG_DIR/namenode.gc.log \
>                     -XX:+HeapDumpOnOutOfMemoryError 
> -XX:ErrorFile=$HADOOP_LOG_DIR/hs_err_pid%p.log 
> -XX:HeapDumpPath=$HADOOP_LOG_DIR \
>                     -Dcom.sun.management.jmxremote \
>                     -Dcom.sun.management.jmxremote.port=9009 \
>                     -Dcom.sun.management.jmxremote.ssl=false \
>                     -Dcom.sun.management.jmxremote.authenticate=false \
>                     $HADOOP_NAMENODE_OPTS
>Reporter: yuyanlei
>Priority: Critical
>
> In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX 
> =280GB). The actual memory usage of Namenode is 479GB
> Output via pamp:
>        Address Perm   Offset Device    Inode      Size       Rss       Pss 
> Referenced Anonymous Swap Locked Mapping
>   2b42f000 rw-p   00:00        0 294174720 293756960 293756960  
> 293756960 293756960    0      0 
>       01e21000 rw-p   00:00        0 195245456 195240848 195240848  
> 195240848 195240848    0      0 [heap]
>   2b897c00 rw-p   00:00        0   9246724   9246724   9246724    
> 9246724   9246724    0      0 
>   2b8bb0905000 rw-p   00:00        0   1781124   1754572   1754572    
> 1754572   1754572    0      0 
>   2b893600 rw-p   00:00        0   1146880   1002084   1002084    
> 1002084   1002084    0      0 
>   2b42db652000 rwxp   00:00        0     57792     55252     55252    
>   55252     55252    0      0 
>   2b42ec12a000 rw-p   00:00        0     25696     24700     24700    
>   24700     24700    0      0 
>   2b42ef25b000 rw-p   00:00        0      9988      8972      8972    
>    8972      8972    0      0 
>   2b8c1d467000 rw-p   00:00        0      9216      8204      8204    
>    8204      8204    0      0 
>   2b8d6f8db000 rw-p   00:00        0      7160      6228      6228    
>    6228      6228    0      0 
> The first line should configure the memory footprint for XMX, and [heap] is 
> unusually large, so a memory leak is suspected!
>  
>  * [heap] is associated with malloc
> After configuring JCMD in the test environment, we found that the malloc part 
> of Internal in JCMD increased significantly when the client was writing to a 
> gz file (XMX =40g in the test environment, and the Internal area was 900MB 
> before the client wrote) :
> Total: reserved=47276MB, committed=47070MB
>  -                 Java Heap (reserved=40960MB, committed=40960MB)
>                             (mmap: reserved=40960MB, committed=40960MB) 
>  
>  -                     Class (reserved=53MB, committed=52MB)
>                             (classes #7423)
>                             (malloc=1MB #17053) 
>                             (mmap: reserved=52MB, committed=52MB) 
>  
>  -                    Thread (reserved=2145MB, committed=2145MB)
>                             (thread #2129)
>                             (stack: reserved=2136MB, committed=2136MB)
>                             (malloc=7MB #10673) 
>                             (arena=2MB #4256)
>  
>  -                      Code (reserved=251MB, committed=45MB)
>                             (malloc=7MB #10661) 
>                             (mmap: reserved=244MB, committed=38MB) 
>  
>  -                        GC (reserved=2307MB, committed=2307MB)
>                             (malloc=755MB #525664) 
>          

[jira] [Commented] (HDFS-16726) There is a memory-related problem about HDFS namenode

2022-08-11 Thread yuyanlei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578831#comment-17578831
 ] 

yuyanlei commented on HDFS-16726:
-

G1RSetRegionEntries default to 256 and cluster environment G1RSetRegionEntries 
are 4096. Increasing G1RSetRegionEntries solves the problem of too long Scan 
Rset time, but this solution leads to too much memory outside the process heap. 
 So at the cost of increasing the memory footprint to keep GC stable,

> There is a memory-related problem about HDFS namenode
> -
>
> Key: HDFS-16726
> URL: https://issues.apache.org/jira/browse/HDFS-16726
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Affects Versions: 2.7.2
> Environment: -Xms280G -Xmx280G -XX:MaxDirectMemorySize=10G 
> -XX:MetaspaceSize=128M -server \
>                     -XX:+UseG1GC -XX:+UseStringDeduplication 
> -XX:MaxGCPauseMillis=250 -XX:+UnlockExperimentalVMOptions 
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 \
>                     -XX:G1OldCSetRegionThresholdPercent=1 
> -XX:G1MixedGCCountTarget=9 -XX:+SafepointTimeout  
> -XX:SafepointTimeoutDelay=4000 \
>                     -XX:ParallelGCThreads=24 -XX:ConcGCThreads=6 
> -XX:G1RSetRegionEntries=4096 -XX:+AggressiveOpts -XX:+DisableExplicitGC \
>                     -XX:G1HeapWastePercent=9 
> -XX:G1MixedGCLiveThresholdPercent=85 -XX:InitiatingHeapOccupancyPercent=75 \
>                     -XX:+ParallelRefProcEnabled -XX:-ResizePLAB  
> -XX:+PrintAdaptiveSizePolicy \
>                     -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps \
>                     -Xloggc:$HADOOP_LOG_DIR/namenode.gc.log \
>                     -XX:+HeapDumpOnOutOfMemoryError 
> -XX:ErrorFile=$HADOOP_LOG_DIR/hs_err_pid%p.log 
> -XX:HeapDumpPath=$HADOOP_LOG_DIR \
>                     -Dcom.sun.management.jmxremote \
>                     -Dcom.sun.management.jmxremote.port=9009 \
>                     -Dcom.sun.management.jmxremote.ssl=false \
>                     -Dcom.sun.management.jmxremote.authenticate=false \
>                     $HADOOP_NAMENODE_OPTS
>Reporter: yuyanlei
>Priority: Critical
> Attachments: 图片_lanxin_20220809153722.png
>
>
> In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX 
> =280GB). The actual memory usage of Namenode is 479GB
> Output via pamp:
>        Address Perm   Offset Device    Inode      Size       Rss       Pss 
> Referenced Anonymous Swap Locked Mapping
>   2b42f000 rw-p   00:00        0 294174720 293756960 293756960  
> 293756960 293756960    0      0 
>       01e21000 rw-p   00:00        0 195245456 195240848 195240848  
> 195240848 195240848    0      0 [heap]
>   2b897c00 rw-p   00:00        0   9246724   9246724   9246724    
> 9246724   9246724    0      0 
>   2b8bb0905000 rw-p   00:00        0   1781124   1754572   1754572    
> 1754572   1754572    0      0 
>   2b893600 rw-p   00:00        0   1146880   1002084   1002084    
> 1002084   1002084    0      0 
>   2b42db652000 rwxp   00:00        0     57792     55252     55252    
>   55252     55252    0      0 
>   2b42ec12a000 rw-p   00:00        0     25696     24700     24700    
>   24700     24700    0      0 
>   2b42ef25b000 rw-p   00:00        0      9988      8972      8972    
>    8972      8972    0      0 
>   2b8c1d467000 rw-p   00:00        0      9216      8204      8204    
>    8204      8204    0      0 
>   2b8d6f8db000 rw-p   00:00        0      7160      6228      6228    
>    6228      6228    0      0 
> The first line should configure the memory footprint for XMX, and [heap] is 
> unusually large, so a memory leak is suspected!
>  
>  * [heap] is associated with malloc
> After configuring JCMD in the test environment, we found that the malloc part 
> of Internal in JCMD increased significantly when the client was writing to a 
> gz file (XMX =40g in the test environment, and the Internal area was 900MB 
> before the client wrote) :
> Total: reserved=47276MB, committed=47070MB
>  -                 Java Heap (reserved=40960MB, committed=40960MB)
>                             (mmap: reserved=40960MB, committed=40960MB) 
>  
>  -                     Class (reserved=53MB, committed=52MB)
>                             (classes #7423)
>                             (malloc=1MB #17053) 
>                             (mmap: reserved=52MB, committed=52MB) 
>  
>  -                    Thread (reserved=2145MB, committed=2145MB)
>                             (thread #2129)
>                             (stack: reserved=2136MB, committed=2136MB)
>                             (malloc=7MB #10673) 
>       

[jira] [Commented] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578791#comment-17578791
 ] 

ASF GitHub Bot commented on HDFS-16678:
---

slfan1989 commented on PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#issuecomment-1212724094

   > @slfan1989 please take another look
   
   LGTM.
   
   @ZanderXu thanks for your contribution! @goiri @goiri Thanks for helping 
review the code!




> RBF supports disable getNodeUsage() in RBFMetrics
> -
>
> Key: HDFS-16678
> URL: https://issues.apache.org/jira/browse/HDFS-16678
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through 
> jmx_exporter. And we found that collection task often failed. 
> After tracing and found that the collection task is blocked at getNodeUsage() 
> in RBFMetrics, because it will collection all datanode's usage from 
> downstream nameservices.  This is a very expensive and almost useless 
> operation. Because in most scenarios, each NameSerivce contains almost the 
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16703) Enable RPC Timeout for some protocols of NameNode.

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578790#comment-17578790
 ] 

ASF GitHub Bot commented on HDFS-16703:
---

slfan1989 commented on PR #4660:
URL: https://github.com/apache/hadoop/pull/4660#issuecomment-1212723170

   @ZanderXu Do we need to update the md file to explain what these parameters 
do?




> Enable RPC Timeout for some protocols of NameNode.
> --
>
> Key: HDFS-16703
> URL: https://issues.apache.org/jira/browse/HDFS-16703
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When I read some code about protocol, I found that only 
> ClientNamenodeProtocolPB proxy with RPC timeout, other protocolPB proxy 
> without RPC timeout, such as RefreshAuthorizationPolicyProtocolPB, 
> RefreshUserMappingsProtocolPB, RefreshCallQueueProtocolPB, 
> GetUserMappingsProtocolPB and NamenodeProtocolPB.
>  
> If proxy without rpc timeout,  it will be blocked for a long time if the NN 
> machine crash or bad network during writing or reading with NN. 
>  
> So I feel that we should enable RPC timeout for all ProtocolPB.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578789#comment-17578789
 ] 

ASF GitHub Bot commented on HDFS-16728:
---

slfan1989 commented on code in PR #4734:
URL: https://github.com/apache/hadoop/pull/4734#discussion_r944106740


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NoLocationException.java:
##
@@ -0,0 +1,33 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.federation.router;
+
+import java.io.IOException;
+
+/**
+ * This exception is thrown when can not get any mount point for the input 
path.
+ * RBF cannot forward any requests for the path.
+ */
+public class NoLocationException extends IOException {
+
+  private static final long serialVersionUID = 1L;

Review Comment:
   Does this part of the code make sense? Why do we assign a value of 1?





> RBF throw IndexOutOfBoundsException with disableNameServices
> 
>
> Key: HDFS-16728
> URL: https://issues.apache.org/jira/browse/HDFS-16728
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RBF will throw an IndexOutOfBoundsException when the namespace is disabled.
> Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0.  
> RBF will throw IndexOutOfBoundsException during handling requests with path 
> starting with /a/b.
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0    at 
> java.util.ArrayList.rangeCheck(ArrayList.java:657)
>     at java.util.ArrayList.get(ArrayList.java:433)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578788#comment-17578788
 ] 

ASF GitHub Bot commented on HDFS-16728:
---

slfan1989 commented on code in PR #4734:
URL: https://github.com/apache/hadoop/pull/4734#discussion_r944106740


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NoLocationException.java:
##
@@ -0,0 +1,33 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.hadoop.hdfs.server.federation.router;
+
+import java.io.IOException;
+
+/**
+ * This exception is thrown when can not get any mount point for the input 
path.
+ * RBF cannot forward any requests for the path.
+ */
+public class NoLocationException extends IOException {
+
+  private static final long serialVersionUID = 1L;

Review Comment:
   Does this part of the code make sense?





> RBF throw IndexOutOfBoundsException with disableNameServices
> 
>
> Key: HDFS-16728
> URL: https://issues.apache.org/jira/browse/HDFS-16728
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RBF will throw an IndexOutOfBoundsException when the namespace is disabled.
> Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0.  
> RBF will throw IndexOutOfBoundsException during handling requests with path 
> starting with /a/b.
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0    at 
> java.util.ArrayList.rangeCheck(ArrayList.java:657)
>     at java.util.ArrayList.get(ArrayList.java:433)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16705) RBF: Support healthMonitor timeout configurable and cache NN and client proxy in NamenodeHeartbeatService

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578785#comment-17578785
 ] 

ASF GitHub Bot commented on HDFS-16705:
---

slfan1989 commented on PR #4662:
URL: https://github.com/apache/hadoop/pull/4662#issuecomment-1212720519

   > It looks good from my side but I'd like to get closure from @slfan1989
   
   LGTM. 
   
   @ZanderXu Thank you for your contribution, @goiri Thank you for helping to 
review the code.




> RBF: Support healthMonitor timeout configurable and cache NN and client proxy 
> in NamenodeHeartbeatService
> -
>
> Key: HDFS-16705
> URL: https://issues.apache.org/jira/browse/HDFS-16705
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When I read NamenodeHeartbeatService.class of RBF, I feel that there are 
> somethings we can do for NamenodeHeartbeatService.class.
>  * Cache NameNode Protocol and Client Protocol to avoid creating a new proxy 
> every time
>  * Supports healthMonitorTimeout configuration
>  * Format code of getNamenodeStatusReport to make it clearer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578782#comment-17578782
 ] 

ASF GitHub Bot commented on HDFS-16689:
---

ferhui commented on PR #4628:
URL: https://github.com/apache/hadoop/pull/4628#issuecomment-1212715520

   @ZanderXu Thanks for your contribution! Merged




> Standby NameNode crashes when transitioning to Active with in-progress tailer
> -
>
> Key: HDFS-16689
> URL: https://issues.apache.org/jira/browse/HDFS-16689
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standby NameNode crashes when transitioning to Active with a in-progress 
> tailer. And the error message like blew:
> {code:java}
> Caused by: java.lang.IllegalStateException: Cannot start writing at txid X 
> when there is a stream available for read: ByteStringEditLog[X, Y], 
> ByteStringEditLog[X, 0]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132)
>   ... 36 more
> {code}
> After tracing and found there is a critical bug in 
> *EditlogTailer#catchupDuringFailover()* when 
> *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* 
> try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. 
> It may cannot replay any edits when they are some abnormal JournalNodes. 
> Reproduce method, suppose:
> - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode 
> is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, 
> JN1 and JN2. 
> - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully 
> synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or 
> restarted.
> - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover 
> active from NN0 to NN1. 
> - NN1 only got two responses from JN0 and JN1 when it try to selecting 
> inputStreams with *fromTxnId=3*  and *onlyDurableTxns=true*, and the count 
> txid of response is 0, 3 respectively. JN2 is abnormal, such as GC,  bad 
> network or restarted.
> - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes 
> because the *maxAllowedTxns* is 0.
> So I think Standby NameNode should *catchupDuringFailover()* with 
> *onlyDurableTxns=false* , so that it can replay all missed edits from 
> JournalNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-11 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei resolved HDFS-16689.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Standby NameNode crashes when transitioning to Active with in-progress tailer
> -
>
> Key: HDFS-16689
> URL: https://issues.apache.org/jira/browse/HDFS-16689
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standby NameNode crashes when transitioning to Active with a in-progress 
> tailer. And the error message like blew:
> {code:java}
> Caused by: java.lang.IllegalStateException: Cannot start writing at txid X 
> when there is a stream available for read: ByteStringEditLog[X, Y], 
> ByteStringEditLog[X, 0]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132)
>   ... 36 more
> {code}
> After tracing and found there is a critical bug in 
> *EditlogTailer#catchupDuringFailover()* when 
> *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* 
> try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. 
> It may cannot replay any edits when they are some abnormal JournalNodes. 
> Reproduce method, suppose:
> - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode 
> is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, 
> JN1 and JN2. 
> - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully 
> synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or 
> restarted.
> - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover 
> active from NN0 to NN1. 
> - NN1 only got two responses from JN0 and JN1 when it try to selecting 
> inputStreams with *fromTxnId=3*  and *onlyDurableTxns=true*, and the count 
> txid of response is 0, 3 respectively. JN2 is abnormal, such as GC,  bad 
> network or restarted.
> - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes 
> because the *maxAllowedTxns* is 0.
> So I think Standby NameNode should *catchupDuringFailover()* with 
> *onlyDurableTxns=false* , so that it can replay all missed edits from 
> JournalNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578778#comment-17578778
 ] 

ASF GitHub Bot commented on HDFS-16689:
---

ferhui merged PR #4628:
URL: https://github.com/apache/hadoop/pull/4628




> Standby NameNode crashes when transitioning to Active with in-progress tailer
> -
>
> Key: HDFS-16689
> URL: https://issues.apache.org/jira/browse/HDFS-16689
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Standby NameNode crashes when transitioning to Active with a in-progress 
> tailer. And the error message like blew:
> {code:java}
> Caused by: java.lang.IllegalStateException: Cannot start writing at txid X 
> when there is a stream available for read: ByteStringEditLog[X, Y], 
> ByteStringEditLog[X, 0]
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113)
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423)
>   at 
> org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132)
>   ... 36 more
> {code}
> After tracing and found there is a critical bug in 
> *EditlogTailer#catchupDuringFailover()* when 
> *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* 
> try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. 
> It may cannot replay any edits when they are some abnormal JournalNodes. 
> Reproduce method, suppose:
> - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode 
> is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, 
> JN1 and JN2. 
> - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully 
> synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or 
> restarted.
> - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover 
> active from NN0 to NN1. 
> - NN1 only got two responses from JN0 and JN1 when it try to selecting 
> inputStreams with *fromTxnId=3*  and *onlyDurableTxns=true*, and the count 
> txid of response is 0, 3 respectively. JN2 is abnormal, such as GC,  bad 
> network or restarted.
> - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes 
> because the *maxAllowedTxns* is 0.
> So I think Standby NameNode should *catchupDuringFailover()* with 
> *onlyDurableTxns=false* , so that it can replay all missed edits from 
> JournalNode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16726) There is a memory-related problem about HDFS namenode

2022-08-11 Thread yuyanlei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuyanlei updated HDFS-16726:

Environment: 
-Xms280G -Xmx280G -XX:MaxDirectMemorySize=10G -XX:MetaspaceSize=128M -server \
                    -XX:+UseG1GC -XX:+UseStringDeduplication 
-XX:MaxGCPauseMillis=250 -XX:+UnlockExperimentalVMOptions 
-XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics 
-XX:PrintSafepointStatisticsCount=1 \
                    -XX:G1OldCSetRegionThresholdPercent=1 
-XX:G1MixedGCCountTarget=9 -XX:+SafepointTimeout  
-XX:SafepointTimeoutDelay=4000 \
                    -XX:ParallelGCThreads=24 -XX:ConcGCThreads=6 
-XX:G1RSetRegionEntries=4096 -XX:+AggressiveOpts -XX:+DisableExplicitGC \
                    -XX:G1HeapWastePercent=9 
-XX:G1MixedGCLiveThresholdPercent=85 -XX:InitiatingHeapOccupancyPercent=75 \
                    -XX:+ParallelRefProcEnabled -XX:-ResizePLAB  
-XX:+PrintAdaptiveSizePolicy \
                    -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps \
                    -Xloggc:$HADOOP_LOG_DIR/namenode.gc.log \
                    -XX:+HeapDumpOnOutOfMemoryError 
-XX:ErrorFile=$HADOOP_LOG_DIR/hs_err_pid%p.log -XX:HeapDumpPath=$HADOOP_LOG_DIR 
\
                    -Dcom.sun.management.jmxremote \
                    -Dcom.sun.management.jmxremote.port=9009 \
                    -Dcom.sun.management.jmxremote.ssl=false \
                    -Dcom.sun.management.jmxremote.authenticate=false \
                    $HADOOP_NAMENODE_OPTS

> There is a memory-related problem about HDFS namenode
> -
>
> Key: HDFS-16726
> URL: https://issues.apache.org/jira/browse/HDFS-16726
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs, namenode
>Affects Versions: 2.7.2
> Environment: -Xms280G -Xmx280G -XX:MaxDirectMemorySize=10G 
> -XX:MetaspaceSize=128M -server \
>                     -XX:+UseG1GC -XX:+UseStringDeduplication 
> -XX:MaxGCPauseMillis=250 -XX:+UnlockExperimentalVMOptions 
> -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics 
> -XX:PrintSafepointStatisticsCount=1 \
>                     -XX:G1OldCSetRegionThresholdPercent=1 
> -XX:G1MixedGCCountTarget=9 -XX:+SafepointTimeout  
> -XX:SafepointTimeoutDelay=4000 \
>                     -XX:ParallelGCThreads=24 -XX:ConcGCThreads=6 
> -XX:G1RSetRegionEntries=4096 -XX:+AggressiveOpts -XX:+DisableExplicitGC \
>                     -XX:G1HeapWastePercent=9 
> -XX:G1MixedGCLiveThresholdPercent=85 -XX:InitiatingHeapOccupancyPercent=75 \
>                     -XX:+ParallelRefProcEnabled -XX:-ResizePLAB  
> -XX:+PrintAdaptiveSizePolicy \
>                     -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps \
>                     -Xloggc:$HADOOP_LOG_DIR/namenode.gc.log \
>                     -XX:+HeapDumpOnOutOfMemoryError 
> -XX:ErrorFile=$HADOOP_LOG_DIR/hs_err_pid%p.log 
> -XX:HeapDumpPath=$HADOOP_LOG_DIR \
>                     -Dcom.sun.management.jmxremote \
>                     -Dcom.sun.management.jmxremote.port=9009 \
>                     -Dcom.sun.management.jmxremote.ssl=false \
>                     -Dcom.sun.management.jmxremote.authenticate=false \
>                     $HADOOP_NAMENODE_OPTS
>Reporter: yuyanlei
>Priority: Critical
> Attachments: 图片_lanxin_20220809153722.png
>
>
> In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX 
> =280GB). The actual memory usage of Namenode is 479GB
> Output via pamp:
>        Address Perm   Offset Device    Inode      Size       Rss       Pss 
> Referenced Anonymous Swap Locked Mapping
>   2b42f000 rw-p   00:00        0 294174720 293756960 293756960  
> 293756960 293756960    0      0 
>       01e21000 rw-p   00:00        0 195245456 195240848 195240848  
> 195240848 195240848    0      0 [heap]
>   2b897c00 rw-p   00:00        0   9246724   9246724   9246724    
> 9246724   9246724    0      0 
>   2b8bb0905000 rw-p   00:00        0   1781124   1754572   1754572    
> 1754572   1754572    0      0 
>   2b893600 rw-p   00:00        0   1146880   1002084   1002084    
> 1002084   1002084    0      0 
>   2b42db652000 rwxp   00:00        0     57792     55252     55252    
>   55252     55252    0      0 
>   2b42ec12a000 rw-p   00:00        0     25696     24700     24700    
>   24700     24700    0      0 
>   2b42ef25b000 rw-p   00:00        0      9988      8972      8972    
>    8972      8972    0      0 
>   2b8c1d467000 rw-p   00:00        0      9216      8204      8204    
>    8204      8204    0      0 
>   2b8d6f8db000 rw-p   00:00        0      7160      6228      6228    
>    6228      6228    0      0 
> The first line should configure the memory footprint for XMX, and [heap] is 
> un

[jira] [Commented] (HDFS-16686) GetJournalEditServlet fails to authorize valid Kerberos request

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578738#comment-17578738
 ] 

ASF GitHub Bot commented on HDFS-16686:
---

hadoop-yetus commented on PR #4724:
URL: https://github.com/apache/hadoop/pull/4724#issuecomment-1212634668

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 36s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 1 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 25s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 43s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 38s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 48s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 39s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 13s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 23s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 26s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  2s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4724/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 9 unchanged - 
0 fixed = 11 total (was 9)  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 34s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 35s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 236m 52s |  |  hadoop-hdfs in the patch 
passed.  |
   | -1 :x: |  asflicense  |   1m 15s | 
[/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4724/2/artifact/out/results-asflicense.txt)
 |  The patch generated 1 ASF License warnings.  |
   |  |   | 347m 21s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4724/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4724 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux c39386dba77e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 
11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / ee6c71e1ccc12e0cd37d0843edd497d09c3f13fe |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4724/2/testReport/ |
   | Max. process+thread count | 3187 (vs. ulim

[jira] [Commented] (HDFS-16688) Unresolved Hosts during startup are not synced by JournalNodes

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578730#comment-17578730
 ] 

ASF GitHub Bot commented on HDFS-16688:
---

hadoop-yetus commented on PR #4725:
URL: https://github.com/apache/hadoop/pull/4725#issuecomment-1212580227

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  0s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  1s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m  2s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 44s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 37s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 26s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 24s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 47s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 47s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  23m 13s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 27s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 20s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 20s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  4s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 57s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 32s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  22m 37s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 426m 52s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 11s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 536m 50s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes |
   |   | hadoop.hdfs.qjournal.client.TestQJMWithFaults |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4725 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 3bed9d08f88f 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 
17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 8d44a37f32e423ecb51d340f847b89cd978c6590 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/3/testReport/ |

[jira] [Commented] (HDFS-16724) RBF should support get the information about ancestor mount points

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578686#comment-17578686
 ] 

ASF GitHub Bot commented on HDFS-16724:
---

goiri commented on code in PR #4719:
URL: https://github.com/apache/hadoop/pull/4719#discussion_r943951153


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java:
##
@@ -935,19 +935,22 @@ public BatchedDirectoryListing getBatchedListing(String[] 
srcs,
   public HdfsFileStatus getFileInfo(String src) throws IOException {
 rpcServer.checkOperation(NameNode.OperationCategory.READ);
 
-final List locations =
-rpcServer.getLocationsForPath(src, false, false);
-RemoteMethod method = new RemoteMethod("getFileInfo",
-new Class[] {String.class}, new RemoteParam());
-
 HdfsFileStatus ret = null;
-// If it's a directory, we check in all locations
-if (rpcServer.isPathAll(src)) {
-  ret = getFileInfoAll(locations, method);
-} else {
-  // Check for file information sequentially
-  ret = rpcClient.invokeSequential(
-  locations, method, HdfsFileStatus.class, null);
+IOException noLocationException = null;
+try {
+  final List locations = 
rpcServer.getLocationsForPath(src, false, false);
+  RemoteMethod method = new RemoteMethod("getFileInfo",
+  new Class[] {String.class}, new RemoteParam());
+
+  // If it's a directory, we check in all locations
+  if (rpcServer.isPathAll(src)) {
+ret = getFileInfoAll(locations, method);
+  } else {
+// Check for file information sequentially
+ret = rpcClient.invokeSequential(locations, method, 
HdfsFileStatus.class, null);
+  }
+} catch (NoLocationException | RouterResolveException e) {

Review Comment:
   That would be nice.





> RBF should support get the information about ancestor mount points
> --
>
> Key: HDFS-16724
> URL: https://issues.apache.org/jira/browse/HDFS-16724
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> Suppose RBF cluster have 2 nameservices and to mount point as below:
>  * /user/ns1 -> ns1 -> /user/ns1
>  * /usre/ns2 -> ns2 -> /user/ns2
> Suppose we disable default nameservice of the RBF cluster and try to 
> getFileInfo of the path /user, RBF will throw one IOException to client due 
> to can not find locations for path /user. 
> But as this case, RBF should should return one valid response to client, 
> because /user has two sub mount point ns1 and ns2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16705) RBF: Support healthMonitor timeout configurable and cache NN and client proxy in NamenodeHeartbeatService

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578681#comment-17578681
 ] 

ASF GitHub Bot commented on HDFS-16705:
---

goiri commented on PR #4662:
URL: https://github.com/apache/hadoop/pull/4662#issuecomment-1212504496

   It is look from my side but I'd like to get closure from @slfan1989 




> RBF: Support healthMonitor timeout configurable and cache NN and client proxy 
> in NamenodeHeartbeatService
> -
>
> Key: HDFS-16705
> URL: https://issues.apache.org/jira/browse/HDFS-16705
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When I read NamenodeHeartbeatService.class of RBF, I feel that there are 
> somethings we can do for NamenodeHeartbeatService.class.
>  * Cache NameNode Protocol and Client Protocol to avoid creating a new proxy 
> every time
>  * Supports healthMonitorTimeout configuration
>  * Format code of getNamenodeStatusReport to make it clearer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error

2022-08-11 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun resolved HDFS-16702.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> MiniDFSCluster should report cause of exception in assertion error
> --
>
> Key: HDFS-16702
> URL: https://issues.apache.org/jira/browse/HDFS-16702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
> Environment: Tests running in the Hadoop dev environment image.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When the MiniDFSClsuter detects that an exception caused an exit, it should 
> include that exception as the cause for the AssertionError that it throws.  
> The current AssertError simply reports the message "Test resulted in an 
> unexpected exit" and provides a stack trace to the location of the check for 
> an exit exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578667#comment-17578667
 ] 

ASF GitHub Bot commented on HDFS-16702:
---

sunchao merged PR #4680:
URL: https://github.com/apache/hadoop/pull/4680




> MiniDFSCluster should report cause of exception in assertion error
> --
>
> Key: HDFS-16702
> URL: https://issues.apache.org/jira/browse/HDFS-16702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
> Environment: Tests running in the Hadoop dev environment image.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When the MiniDFSClsuter detects that an exception caused an exit, it should 
> include that exception as the cause for the AssertionError that it throws.  
> The current AssertError simply reports the message "Test resulted in an 
> unexpected exit" and provides a stack trace to the location of the check for 
> an exit exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578668#comment-17578668
 ] 

ASF GitHub Bot commented on HDFS-16702:
---

sunchao commented on PR #4680:
URL: https://github.com/apache/hadoop/pull/4680#issuecomment-1212481643

   Merged to trunk, thanks @snmvaughan .




> MiniDFSCluster should report cause of exception in assertion error
> --
>
> Key: HDFS-16702
> URL: https://issues.apache.org/jira/browse/HDFS-16702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
> Environment: Tests running in the Hadoop dev environment image.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When the MiniDFSClsuter detects that an exception caused an exit, it should 
> include that exception as the cause for the AssertionError that it throws.  
> The current AssertError simply reports the message "Test resulted in an 
> unexpected exit" and provides a stack trace to the location of the check for 
> an exit exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16695) Improve Code with Lambda in org.apahce.hadoop.hdfs.server.namenode package

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578658#comment-17578658
 ] 

ASF GitHub Bot commented on HDFS-16695:
---

hadoop-yetus commented on PR #4668:
URL: https://github.com/apache/hadoop/pull/4668#issuecomment-1212469682

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 48s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | -1 :x: |  test4tests  |   0m  0s |  |  The patch doesn't appear to include 
any new or modified tests. Please justify why no new tests are needed for this 
patch. Also please list what manual steps were performed to verify this patch.  
|
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  41m 13s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 42s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 24s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 39s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 19s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 44s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 46s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 27s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 24s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 30s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 19s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 19s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  6s |  |  
hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 777 unchanged - 39 
fixed = 777 total (was 816)  |
   | +1 :green_heart: |  mvnsite  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 58s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 30s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 33s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 18s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 342m  5s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 59s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 460m 59s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4668/3/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4668 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux f1e4004c0e15 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 5e1e30451f406e776e66135f7d00265c763345a0 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4668/3/testReport/ |
   | Max. process+thread count | 1961 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-m

[jira] [Resolved] (HDFS-13274) RBF: Extend RouterRpcClient to use multiple sockets

2022-08-11 Thread Jira


 [ 
https://issues.apache.org/jira/browse/HDFS-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Íñigo Goiri resolved HDFS-13274.

Fix Version/s: 3.4.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> RBF: Extend RouterRpcClient to use multiple sockets
> ---
>
> Key: HDFS-13274
> URL: https://issues.apache.org/jira/browse/HDFS-13274
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> HADOOP-13144 introduces the ability to create multiple connections for the 
> same user and use different sockets. The RouterRpcClient should use this 
> approach to get a better throughput.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13274) RBF: Extend RouterRpcClient to use multiple sockets

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578651#comment-17578651
 ] 

ASF GitHub Bot commented on HDFS-13274:
---

goiri merged PR #4531:
URL: https://github.com/apache/hadoop/pull/4531




> RBF: Extend RouterRpcClient to use multiple sockets
> ---
>
> Key: HDFS-13274
> URL: https://issues.apache.org/jira/browse/HDFS-13274
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: Íñigo Goiri
>Assignee: Íñigo Goiri
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> HADOOP-13144 introduces the ability to create multiple connections for the 
> same user and use different sockets. The RouterRpcClient should use this 
> approach to get a better throughput.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578649#comment-17578649
 ] 

ASF GitHub Bot commented on HDFS-16702:
---

snmvaughan commented on PR #4680:
URL: https://github.com/apache/hadoop/pull/4680#issuecomment-1212454842

   I looked into the 2 failed unit tests, and they don't appear to be related 
to the change.




> MiniDFSCluster should report cause of exception in assertion error
> --
>
> Key: HDFS-16702
> URL: https://issues.apache.org/jira/browse/HDFS-16702
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: hdfs
> Environment: Tests running in the Hadoop dev environment image.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> When the MiniDFSClsuter detects that an exception caused an exit, it should 
> include that exception as the cause for the AssertionError that it throws.  
> The current AssertError simply reports the message "Test resulted in an 
> unexpected exit" and provides a stack trace to the location of the check for 
> an exit exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578579#comment-17578579
 ] 

ASF GitHub Bot commented on HDFS-16728:
---

goiri commented on code in PR #4734:
URL: https://github.com/apache/hadoop/pull/4734#discussion_r943742721


##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java:
##
@@ -1547,6 +1547,24 @@ public void testRenewLeaseWithMultiStream() throws 
Exception {
 }
   }
 
+  @Test
+  public void testMkdirWithDisableNameService() throws Exception {
+MockResolver resolver = 
(MockResolver)router.getRouter().getSubclusterResolver();
+String ns0 = cluster.getNameservices().get(0);
+resolver.addLocation("/mnt", ns0, "/");
+MockResolver activeNamenodeResolver = 
(MockResolver)router.getRouter().getNamenodeResolver();
+activeNamenodeResolver.disableNamespace(ns0);
+
+try {
+  FsPermission permission = new FsPermission("777");
+  LambdaTestUtils.intercept(NoLocationException.class,
+  () -> router.getRouter().getRpcServer()

Review Comment:
   Extract router.getRouter().getRpcServer() and we can make this fit in one 
line.



##
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java:
##
@@ -1765,6 +1765,9 @@ protected List getLocationsForPath(String 
path,
   locs.add(loc);
 }
   }
+  if (locs.isEmpty()) {
+throw new NoLocationException(path, 
this.subclusterResolver.getClass().getSimpleName());

Review Comment:
   Cleaner to pass this.subclusterResolver.getClass() and do the 
getSimpleName() in the exception.





> RBF throw IndexOutOfBoundsException with disableNameServices
> 
>
> Key: HDFS-16728
> URL: https://issues.apache.org/jira/browse/HDFS-16728
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RBF will throw an IndexOutOfBoundsException when the namespace is disabled.
> Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0.  
> RBF will throw IndexOutOfBoundsException during handling requests with path 
> starting with /a/b.
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0    at 
> java.util.ArrayList.rangeCheck(ArrayList.java:657)
>     at java.util.ArrayList.get(ArrayList.java:433)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578575#comment-17578575
 ] 

ASF GitHub Bot commented on HDFS-13522:
---

simbadzina commented on code in PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#discussion_r943730742


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FederatedNamespaceIds.java:
##
@@ -0,0 +1,85 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs;
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos;
+import 
org.apache.hadoop.ipc.protobuf.RpcHeaderProtos.RouterFederatedStateProto;
+import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException;
+import org.apache.hadoop.thirdparty.protobuf.ByteString;
+
+
+/** Collection of last-seen namespace state Ids for a set of namespaces. */

Review Comment:
   Added a comment
   
   > A single NamespaceStateId is shared by all outgoing connections to a 
particular namespace. 
   > Router clients share and query the entire collection.





> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578573#comment-17578573
 ] 

ASF GitHub Bot commented on HDFS-13522:
---

simbadzina commented on code in PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#discussion_r943730742


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FederatedNamespaceIds.java:
##
@@ -0,0 +1,85 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs;
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos;
+import 
org.apache.hadoop.ipc.protobuf.RpcHeaderProtos.RouterFederatedStateProto;
+import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException;
+import org.apache.hadoop.thirdparty.protobuf.ByteString;
+
+
+/** Collection of last-seen namespace state Ids for a set of namespaces. */

Review Comment:
   Added a comment
   
   > A single NamespaceStateId is shared by all outgoing connections to a 
particular namespace. 
   > Router clients query the entire collection.





> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578569#comment-17578569
 ] 

ASF GitHub Bot commented on HDFS-13522:
---

simbadzina commented on code in PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#discussion_r943725929


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientGSIContext.java:
##
@@ -66,15 +79,23 @@ public void 
updateResponseState(RpcResponseHeaderProto.Builder header) {
*/
   @Override
   public void receiveResponseState(RpcResponseHeaderProto header) {
-lastSeenStateId.accumulate(header.getStateId());
+lastSeenStateId.update(header.getStateId());

Review Comment:
   I agree. Fixed.



##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientGSIContext.java:
##
@@ -37,14 +37,27 @@
 @InterfaceStability.Evolving
 public class ClientGSIContext implements AlignmentContext {
 
-  private final LongAccumulator lastSeenStateId =
-  new LongAccumulator(Math::max, Long.MIN_VALUE);
+  private final NamespaceStateId lastSeenStateId;
+  private ByteString routerFederatedState;
+
+  public ClientGSIContext() {
+this(new NamespaceStateId());
+  }
+
+  public ClientGSIContext(NamespaceStateId lastSeenStateId) {
+this.lastSeenStateId = lastSeenStateId;
+routerFederatedState = null;
+  }
 
   @Override
   public long getLastSeenStateId() {
 return lastSeenStateId.get();
   }
 
+  public void updateLastSeenStateID(Long stateId) {

Review Comment:
   Removed.





> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaining Estimate: 0h
>
> Changes will need to occur to the router to support the new observer node.
> One such change will be to make the router understand the observer state, 
> e.g. {{FederationNamenodeServiceState}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578570#comment-17578570
 ] 

ASF GitHub Bot commented on HDFS-13522:
---

simbadzina commented on code in PR #4311:
URL: https://github.com/apache/hadoop/pull/4311#discussion_r943726586


##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FederatedNamespaceIds.java:
##
@@ -0,0 +1,85 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs;
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos;
+import 
org.apache.hadoop.ipc.protobuf.RpcHeaderProtos.RouterFederatedStateProto;
+import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException;
+import org.apache.hadoop.thirdparty.protobuf.ByteString;
+
+
+/** Collection of last-seen namespace state Ids for a set of namespaces. */
+public class FederatedNamespaceIds {

Review Comment:
   Added locking around iterations and modifications to the hashMap.



##
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FederatedNamespaceIds.java:
##
@@ -0,0 +1,85 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hadoop.hdfs;
+
+import java.util.Collections;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import org.apache.hadoop.classification.VisibleForTesting;
+import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos;
+import 
org.apache.hadoop.ipc.protobuf.RpcHeaderProtos.RouterFederatedStateProto;
+import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException;
+import org.apache.hadoop.thirdparty.protobuf.ByteString;
+
+
+/** Collection of last-seen namespace state Ids for a set of namespaces. */
+public class FederatedNamespaceIds {
+  private final Map namespaceIdMap = new 
ConcurrentHashMap<>();
+
+  public void 
updateStateUsingRequestHeader(RpcHeaderProtos.RpcRequestHeaderProto header) {
+if (header.hasRouterFederatedState()) {
+  RouterFederatedStateProto federatedState = null;
+  try {
+federatedState = 
RouterFederatedStateProto.parseFrom(header.getRouterFederatedState());
+  } catch (InvalidProtocolBufferException e) {
+throw new RuntimeException(e);
+  }
+  
federatedState.getNamespaceStateIdsMap().forEach(this::updateNamespaceState);

Review Comment:
   Added





> RBF: Support observer node from Router-Based Federation
> ---
>
> Key: HDFS-13522
> URL: https://issues.apache.org/jira/browse/HDFS-13522
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: federation, namenode
>Reporter: Erik Krogen
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, 
> HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC 
> clogging.png, ShortTerm-Routers+Observer.png, 
> observer_reads_in_rbf_proposal_simbadzina_v1.pdf, 
> observer_reads_in_rbf_proposal_simbadzina_v2.pdf
>
>  Time Spent: 20h 50m
>  Remaini

[jira] [Commented] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578561#comment-17578561
 ] 

ASF GitHub Bot commented on HDFS-16728:
---

hadoop-yetus commented on PR #4734:
URL: https://github.com/apache/hadoop/pull/4734#issuecomment-1212228190

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 41s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  38m 30s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m  2s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   0m 57s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   0m 52s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m  2s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m  9s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 16s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 48s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  21m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   0m 42s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 43s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   0m 43s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   0m 40s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   0m 40s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   0m 26s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   0m 43s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 42s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   0m 56s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   1m 26s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  20m 30s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  |  22m 17s |  |  hadoop-hdfs-rbf in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   0m 52s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 120m 33s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4734/1/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4734 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 730ae41fa654 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / e21fec18ea8c5b29fe7771272efb31fcbbfd22f9 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4734/1/testReport/ |
   | Max. process+thread count | 2805 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: 
hadoop-hdfs-project/hadoop-hdfs-rbf |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4734/1/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated

[jira] [Updated] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices

2022-08-11 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HDFS-16728:
--
Labels: pull-request-available  (was: )

> RBF throw IndexOutOfBoundsException with disableNameServices
> 
>
> Key: HDFS-16728
> URL: https://issues.apache.org/jira/browse/HDFS-16728
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>
> RBF will throw an IndexOutOfBoundsException when the namespace is disabled.
> Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0.  
> RBF will throw IndexOutOfBoundsException during handling requests with path 
> starting with /a/b.
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0    at 
> java.util.ArrayList.rangeCheck(ArrayList.java:657)
>     at java.util.ArrayList.get(ArrayList.java:433)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578513#comment-17578513
 ] 

ASF GitHub Bot commented on HDFS-16728:
---

ZanderXu opened a new pull request, #4734:
URL: https://github.com/apache/hadoop/pull/4734

   ### Description of PR
   RBF will throw an IndexOutOfBoundsException when the namespace is disabled.
   
   Suppose we have a mount point /a/b -> ns0 -> /a/b and disabled the ns0.  
   
   RBF will throw IndexOutOfBoundsException during handling requests with path 
starting with /a/b.
   
   ```
   java.lang.IndexOutOfBoundsException: Index: 0, Size: 0at 
java.util.ArrayList.rangeCheck(ArrayList.java:657)
   at java.util.ArrayList.get(ArrayList.java:433)
   at 
org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756)
   at 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980)
 
   ```
   
   




> RBF throw IndexOutOfBoundsException with disableNameServices
> 
>
> Key: HDFS-16728
> URL: https://issues.apache.org/jira/browse/HDFS-16728
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>
> RBF will throw an IndexOutOfBoundsException when the namespace is disabled.
> Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0.  
> RBF will throw IndexOutOfBoundsException during handling requests with path 
> starting with /a/b.
> {code:java}
> java.lang.IndexOutOfBoundsException: Index: 0, Size: 0    at 
> java.util.ArrayList.rangeCheck(ArrayList.java:657)
>     at java.util.ArrayList.get(ArrayList.java:433)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756)
>     at 
> org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices

2022-08-11 Thread ZanderXu (Jira)
ZanderXu created HDFS-16728:
---

 Summary: RBF throw IndexOutOfBoundsException with 
disableNameServices
 Key: HDFS-16728
 URL: https://issues.apache.org/jira/browse/HDFS-16728
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: ZanderXu
Assignee: ZanderXu


RBF will throw an IndexOutOfBoundsException when the namespace is disabled.

Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0.  

RBF will throw IndexOutOfBoundsException during handling requests with path 
starting with /a/b.
{code:java}
java.lang.IndexOutOfBoundsException: Index: 0, Size: 0    at 
java.util.ArrayList.rangeCheck(ArrayList.java:657)
    at java.util.ArrayList.get(ArrayList.java:433)
    at 
org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756)
    at 
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16688) Unresolved Hosts during startup are not synced by JournalNodes

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578498#comment-17578498
 ] 

ASF GitHub Bot commented on HDFS-16688:
---

snmvaughan commented on PR #4725:
URL: https://github.com/apache/hadoop/pull/4725#issuecomment-1212049220

   I was unable to replicate the test failure 
`TestQJMWithFaults.testUnresolvableHostName` locally.




> Unresolved Hosts during startup are not synced by JournalNodes
> --
>
> Key: HDFS-16688
> URL: https://issues.apache.org/jira/browse/HDFS-16688
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: journal-node
> Environment: Running in Kubernetes using Java 11, with an HA 
> configuration.
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
>  Labels: pull-request-available
>
> During the JournalNode startup, it builds the list of servers in the 
> JournalNode set, ignoring hostnames that cannot be resolved.  In environments 
> with dynamic IP address allocations this means that the JournalNodeSyncer 
> will never sync with hosts that aren't resolvable during startup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16705) RBF: Support healthMonitor timeout configurable and cache NN and client proxy in NamenodeHeartbeatService

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578469#comment-17578469
 ] 

ASF GitHub Bot commented on HDFS-16705:
---

ZanderXu commented on PR #4662:
URL: https://github.com/apache/hadoop/pull/4662#issuecomment-1211954930

   @slfan1989 Ping, master, can help me review patch?




> RBF: Support healthMonitor timeout configurable and cache NN and client proxy 
> in NamenodeHeartbeatService
> -
>
> Key: HDFS-16705
> URL: https://issues.apache.org/jira/browse/HDFS-16705
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> When I read NamenodeHeartbeatService.class of RBF, I feel that there are 
> somethings we can do for NamenodeHeartbeatService.class.
>  * Cache NameNode Protocol and Client Protocol to avoid creating a new proxy 
> every time
>  * Supports healthMonitorTimeout configuration
>  * Format code of getNamenodeStatusReport to make it clearer



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16713) Improve Code with Lambda in org.apahce.hadoop.hdfs.server.namenode sub packages

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578467#comment-17578467
 ] 

ASF GitHub Bot commented on HDFS-16713:
---

ZanderXu commented on PR #4674:
URL: https://github.com/apache/hadoop/pull/4674#issuecomment-1211953731

   @goiri Hi, Master, can help me merge this PR into trunk? Thank you very much.




> Improve Code with Lambda in org.apahce.hadoop.hdfs.server.namenode sub 
> packages
> ---
>
> Key: HDFS-16713
> URL: https://issues.apache.org/jira/browse/HDFS-16713
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Improve Code with Lambda in org.apahce.hadoop.hdfs.server.namenode sub 
> packages.
> For example:
> Current logic:
> {code:java}
> public ListenableFuture getJournaledEdits(
>   long fromTxnId, int maxTransactions) {
> return parallelExecutor.submit(
> new Callable() {
>   @Override
>   public GetJournaledEditsResponseProto call() throws IOException {
> return getProxy().getJournaledEdits(journalId, nameServiceId,
> fromTxnId, maxTransactions);
>   }
> });
>   } {code}
> Improved Code with Lambda:
> {code:java}
> public ListenableFuture getJournaledEdits(
>   long fromTxnId, int maxTransactions) {
> return parallelExecutor.submit(() -> getProxy().getJournaledEdits(
> journalId, nameServiceId, fromTxnId, maxTransactions));
>   } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16710) Remove redundant throw exceptions in org.apahce.hadoop.hdfs.server.namenode package

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578466#comment-17578466
 ] 

ASF GitHub Bot commented on HDFS-16710:
---

ZanderXu commented on PR #4670:
URL: https://github.com/apache/hadoop/pull/4670#issuecomment-1211952616

   @goiri Hi, Master, can help me merge this pr into trunk? Thank you very much.




> Remove redundant throw exceptions in org.apahce.hadoop.hdfs.server.namenode 
> package
> ---
>
> Key: HDFS-16710
> URL: https://issues.apache.org/jira/browse/HDFS-16710
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: ZanderXu
>Assignee: ZanderXu
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> When I read some class about HDFS NameNode, I found there are many redundant 
> throw exception in org.apahce.hadoop.hdfs.server.namenode package, such as:
> {code:java}
> public synchronized void transitionToObserver(StateChangeRequestInfo req)
> throws ServiceFailedException, AccessControlException, IOException {
>   checkNNStartup();
>   nn.checkHaStateChange(req);
>   nn.transitionToObserver();
> } {code}
> Because ServiceFailedException and AccessControlException is subClass of 
> IOException, so I feel that ServiceFailedException and AccessControlException 
> are redundant, so we can remove it to make code clearer, such as:
> {code:java}
> public synchronized void transitionToObserver(StateChangeRequestInfo req)
> throws IOException {
>   checkNNStartup();
>   nn.checkHaStateChange(req);
>   nn.transitionToObserver();
> } {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578385#comment-17578385
 ] 

ASF GitHub Bot commented on HDFS-16689:
---

hadoop-yetus commented on PR #4628:
URL: https://github.com/apache/hadoop/pull/4628#issuecomment-1211814838

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m 24s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  1s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  1s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 16s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 44s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 22s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 45s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   4m  8s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 21s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 43s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 43s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 31s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 33s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  7s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 57s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m  4s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 375m 35s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m 21s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 497m 27s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | 
hadoop.hdfs.server.blockmanagement.TestBlockTokenWithShortCircuitRead |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/5/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4628 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 0f0416ed0c69 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 5e6adb43e490330c7aa5fd71216308fb0915d3b9 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/5/testReport/ |
   | Max. process+thread count | 2109 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 

[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578386#comment-17578386
 ] 

ASF GitHub Bot commented on HDFS-16689:
---

hadoop-yetus commented on PR #4628:
URL: https://github.com/apache/hadoop/pull/4628#issuecomment-1211815047

   :confetti_ball: **+1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   1m  1s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 2 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  42m 24s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 50s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 36s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 23s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 47s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 25s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 43s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   4m 11s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m 31s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 33s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 40s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 40s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 28s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 28s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | +1 :green_heart: |  checkstyle  |   1m  9s |  |  the patch passed  |
   | +1 :green_heart: |  mvnsite  |   1m 36s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   1m  4s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 40s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 52s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m 10s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | +1 :green_heart: |  unit  | 374m  1s |  |  hadoop-hdfs in the patch 
passed.  |
   | +1 :green_heart: |  asflicense  |   1m 12s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 496m 15s |  |  |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/6/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4628 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets |
   | uname | Linux 6211aff2a338 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 5e6adb43e490330c7aa5fd71216308fb0915d3b9 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 
/usr/lib/jvm/java-8-openjdk-amd64:Private 
Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   |  Test Results | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/6/testReport/ |
   | Max. process+thread count | 2195 (vs. ulimit of 5500) |
   | modules | C: hadoop-hdfs-project/hadoop-hdfs U: 
hadoop-hdfs-project/hadoop-hdfs |
   | Console output | 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/6/console |
   | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 |
   | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org |
   
   
   This message was automatically generated.
   
   



[jira] [Commented] (HDFS-16688) Unresolved Hosts during startup are not synced by JournalNodes

2022-08-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578356#comment-17578356
 ] 

ASF GitHub Bot commented on HDFS-16688:
---

hadoop-yetus commented on PR #4725:
URL: https://github.com/apache/hadoop/pull/4725#issuecomment-1211762044

   :broken_heart: **-1 overall**
   
   
   
   
   
   
   | Vote | Subsystem | Runtime |  Logfile | Comment |
   |::|--:|:|::|:---:|
   | +0 :ok: |  reexec  |   0m 50s |  |  Docker mode activated.  |
    _ Prechecks _ |
   | +1 :green_heart: |  dupname  |   0m  0s |  |  No case conflicting files 
found.  |
   | +0 :ok: |  codespell  |   0m  0s |  |  codespell was not available.  |
   | +0 :ok: |  detsecrets  |   0m  0s |  |  detect-secrets was not available.  
|
   | +0 :ok: |  xmllint  |   0m  0s |  |  xmllint was not available.  |
   | +1 :green_heart: |  @author  |   0m  0s |  |  The patch does not contain 
any @author tags.  |
   | +1 :green_heart: |  test4tests  |   0m  0s |  |  The patch appears to 
include 3 new or modified test files.  |
    _ trunk Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |  41m  6s |  |  trunk passed  |
   | +1 :green_heart: |  compile  |   1m 42s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  compile  |   1m 31s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  checkstyle  |   1m 20s |  |  trunk passed  |
   | +1 :green_heart: |  mvnsite  |   1m 40s |  |  trunk passed  |
   | +1 :green_heart: |  javadoc  |   1m 19s |  |  trunk passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 45s |  |  trunk passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 47s |  |  trunk passed  |
   | +1 :green_heart: |  shadedclient  |  26m  7s |  |  branch has no errors 
when building and testing our client artifacts.  |
    _ Patch Compile Tests _ |
   | +1 :green_heart: |  mvninstall  |   1m 22s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 30s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javac  |   1m 30s |  |  the patch passed  |
   | +1 :green_heart: |  compile  |   1m 21s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  javac  |   1m 21s |  |  the patch passed  |
   | +1 :green_heart: |  blanks  |   0m  0s |  |  The patch has no blanks 
issues.  |
   | -0 :warning: |  checkstyle  |   1m  1s | 
[/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 251 unchanged 
- 0 fixed = 255 total (was 251)  |
   | +1 :green_heart: |  mvnsite  |   1m 27s |  |  the patch passed  |
   | +1 :green_heart: |  javadoc  |   0m 59s |  |  the patch passed with JDK 
Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1  |
   | +1 :green_heart: |  javadoc  |   1m 33s |  |  the patch passed with JDK 
Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07  |
   | +1 :green_heart: |  spotbugs  |   3m 34s |  |  the patch passed  |
   | +1 :green_heart: |  shadedclient  |  26m  4s |  |  patch has no errors 
when building and testing our client artifacts.  |
    _ Other Tests _ |
   | -1 :x: |  unit  | 346m 11s | 
[/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt)
 |  hadoop-hdfs in the patch passed.  |
   | +1 :green_heart: |  asflicense  |   1m  4s |  |  The patch does not 
generate ASF License warnings.  |
   |  |   | 464m 33s |  |  |
   
   
   | Reason | Tests |
   |---:|:--|
   | Failed junit tests | hadoop.hdfs.qjournal.client.TestQJMWithFaults |
   
   
   | Subsystem | Report/Notes |
   |--:|:-|
   | Docker | ClientAPI=1.41 ServerAPI=1.41 base: 
https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/2/artifact/out/Dockerfile
 |
   | GITHUB PR | https://github.com/apache/hadoop/pull/4725 |
   | Optional Tests | dupname asflicense compile javac javadoc mvninstall 
mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint |
   | uname | Linux 56ccbb36f24e 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 
17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux |
   | Build tool | maven |
   | Personality | dev-support/bin/hadoop.sh |
   | git revision | trunk / 38031b2cb60af2ae9c5ac5f8ccb95feae8235f04 |
   | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 |
   | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private 
Build-11.0.15

[jira] [Assigned] (HDFS-2139) Fast copy for HDFS.

2022-08-11 Thread Hui Fei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hui Fei reassigned HDFS-2139:
-

Assignee: ZanderXu  (was: Rituraj)

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: ZanderXu
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-11 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578340#comment-17578340
 ] 

Hui Fei commented on HDFS-2139:
---

Glad to receive positive feedbacks, thank you! [~xuzq_zander] is interested in 
this feature and will assign this ticket to him. We can help review.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-2139) Fast copy for HDFS.

2022-08-11 Thread Hui Fei (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578340#comment-17578340
 ] 

Hui Fei edited comment on HDFS-2139 at 8/11/22 9:16 AM:


[~weichiu] [~ayushtkn] [~pengbei] Glad to receive positive feedbacks, thank 
you! [~xuzq_zander] is interested in this feature and will assign this ticket 
to him. We can help review.


was (Author: ferhui):
Glad to receive positive feedbacks, thank you! [~xuzq_zander] is interested in 
this feature and will assign this ticket to him. We can help review.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-16726) There is a memory-related problem about HDFS namenode

2022-08-11 Thread yuyanlei (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yuyanlei updated HDFS-16726:

Description: 
In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX 
=280GB). The actual memory usage of Namenode is 479GB

Output via pamp:

       Address Perm   Offset Device    Inode      Size       Rss       Pss 
Referenced Anonymous Swap Locked Mapping
  2b42f000 rw-p   00:00        0 294174720 293756960 293756960  
293756960 293756960    0      0 
      01e21000 rw-p   00:00        0 195245456 195240848 195240848  
195240848 195240848    0      0 [heap]
  2b897c00 rw-p   00:00        0   9246724   9246724   9246724    
9246724   9246724    0      0 
  2b8bb0905000 rw-p   00:00        0   1781124   1754572   1754572    
1754572   1754572    0      0 
  2b893600 rw-p   00:00        0   1146880   1002084   1002084    
1002084   1002084    0      0 
  2b42db652000 rwxp   00:00        0     57792     55252     55252      
55252     55252    0      0 
  2b42ec12a000 rw-p   00:00        0     25696     24700     24700      
24700     24700    0      0 
  2b42ef25b000 rw-p   00:00        0      9988      8972      8972      
 8972      8972    0      0 
  2b8c1d467000 rw-p   00:00        0      9216      8204      8204      
 8204      8204    0      0 
  2b8d6f8db000 rw-p   00:00        0      7160      6228      6228      
 6228      6228    0      0 

The first line should configure the memory footprint for XMX, and [heap] is 
unusually large, so a memory leak is suspected!

 
 * [heap] is associated with malloc

After configuring JCMD in the test environment, we found that the malloc part 
of Internal in JCMD increased significantly when the client was writing to a gz 
file (XMX =40g in the test environment, and the Internal area was 900MB before 
the client wrote) :

Total: reserved=47276MB, committed=47070MB
 -                 Java Heap (reserved=40960MB, committed=40960MB)
                            (mmap: reserved=40960MB, committed=40960MB) 
 
 -                     Class (reserved=53MB, committed=52MB)
                            (classes #7423)
                            (malloc=1MB #17053) 
                            (mmap: reserved=52MB, committed=52MB) 
 
 -                    Thread (reserved=2145MB, committed=2145MB)
                            (thread #2129)
                            (stack: reserved=2136MB, committed=2136MB)
                            (malloc=7MB #10673) 
                            (arena=2MB #4256)
 
 -                      Code (reserved=251MB, committed=45MB)
                            (malloc=7MB #10661) 
                            (mmap: reserved=244MB, committed=38MB) 
 
 -                        GC (reserved=2307MB, committed=2307MB)
                            (malloc=755MB #525664) 
                            (mmap: reserved=1552MB, committed=1552MB) 
 
 -                  Compiler (reserved=8MB, committed=8MB)
                            (malloc=8MB #8852) 
 
 -                  Internal (reserved=1524MB, committed=1524MB)
                            (malloc=1524MB #323482) 
 
 -                    Symbol (reserved=12MB, committed=12MB)
                            (malloc=10MB #91715) 
                            (arena=2MB #1)
 
 -    Native Memory Tracking (reserved=16MB, committed=16MB)
                            (tracking overhead=15MB)

It is clear that the Internal malloc increases significantly when the client 
writes, and does not decrease after the client stops writing

 

Through pref, I found some more instances when writing on the client side:

Children      Self  Comm  Shared Ob  Symbol                                     
                                                                                
                                                
     0.05%     0.00%  java  libzip.so  [.] Java_java_util_zip_ZipFile_getEntry
     0.02%     0.00%  java  libzip.so  [.] 
Java_java_util_zip_Inflater_inflateBytes

Therefore, it is suspected that the compressed write operation of the client 
may have a memory leak problem

 

Use JCMD to locate the call link to Java_java_util_zip_Inflater_inflateBytes:

"ExtensionRefresher" #59 daemon prio=5 os_prio=0 tid=0x2419d000 
nid=0x69df runnable [0x2b319d7a]
   java.lang.Thread.State: RUNNABLE
        at java.util.zip.Inflater.inflateBytes(Native Method)
        at java.util.zip.Inflater.inflate(Inflater.java:259)
        - locked <0x2b278f7b9da8> (a java.util.zip.ZStreamRef)
        at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at 
org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown 
Source)
        at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)

[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-11 Thread ZanderXu (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578322#comment-17578322
 ] 

ZanderXu commented on HDFS-2139:


Thanks [~weichiu] [~ayushtkn] [~ferhui] [~pengbei] for your comments. I will 
prepare a detailed design this weekend, please help me review it after I 
completed.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-2139) Fast copy for HDFS.

2022-08-11 Thread Bei Peng (Jira)


[ 
https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578296#comment-17578296
 ] 

Bei Peng commented on HDFS-2139:


{quote}Many companies backport it into their internal branches and use it.
 * DistCp supports fastcopy
 * Implement block based strategy{quote}
me too.

> Fast copy for HDFS.
> ---
>
> Key: HDFS-2139
> URL: https://issues.apache.org/jira/browse/HDFS-2139
> Project: Hadoop HDFS
>  Issue Type: New Feature
>Reporter: Pritam Damania
>Assignee: Rituraj
>Priority: Major
> Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, 
> HDFS-2139.patch, image-2022-08-11-11-48-17-994.png
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> There is a need to perform fast file copy on HDFS. The fast copy mechanism 
> for a file works as
> follows :
> 1) Query metadata for all blocks of the source file.
> 2) For each block 'b' of the file, find out its datanode locations.
> 3) For each block of the file, add an empty block to the namesystem for
> the destination file.
> 4) For each location of the block, instruct the datanode to make a local
> copy of that block.
> 5) Once each datanode has copied over its respective blocks, they
> report to the namenode about it.
> 6) Wait for all blocks to be copied and exit.
> This would speed up the copying process considerably by removing top of
> the rack data transfers.
> Note : An extra improvement, would be to instruct the datanode to create a
> hardlink of the block file if we are copying a block on the same datanode



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org