[jira] [Updated] (HDFS-16726) There is a memory-related problem about HDFS namenode
[ https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuyanlei updated HDFS-16726: Attachment: (was: 图片_lanxin_20220809153722.png) > There is a memory-related problem about HDFS namenode > - > > Key: HDFS-16726 > URL: https://issues.apache.org/jira/browse/HDFS-16726 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.7.2 > Environment: -Xms280G -Xmx280G -XX:MaxDirectMemorySize=10G > -XX:MetaspaceSize=128M -server \ > -XX:+UseG1GC -XX:+UseStringDeduplication > -XX:MaxGCPauseMillis=250 -XX:+UnlockExperimentalVMOptions > -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=1 \ > -XX:G1OldCSetRegionThresholdPercent=1 > -XX:G1MixedGCCountTarget=9 -XX:+SafepointTimeout > -XX:SafepointTimeoutDelay=4000 \ > -XX:ParallelGCThreads=24 -XX:ConcGCThreads=6 > -XX:G1RSetRegionEntries=4096 -XX:+AggressiveOpts -XX:+DisableExplicitGC \ > -XX:G1HeapWastePercent=9 > -XX:G1MixedGCLiveThresholdPercent=85 -XX:InitiatingHeapOccupancyPercent=75 \ > -XX:+ParallelRefProcEnabled -XX:-ResizePLAB > -XX:+PrintAdaptiveSizePolicy \ > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ > -Xloggc:$HADOOP_LOG_DIR/namenode.gc.log \ > -XX:+HeapDumpOnOutOfMemoryError > -XX:ErrorFile=$HADOOP_LOG_DIR/hs_err_pid%p.log > -XX:HeapDumpPath=$HADOOP_LOG_DIR \ > -Dcom.sun.management.jmxremote \ > -Dcom.sun.management.jmxremote.port=9009 \ > -Dcom.sun.management.jmxremote.ssl=false \ > -Dcom.sun.management.jmxremote.authenticate=false \ > $HADOOP_NAMENODE_OPTS >Reporter: yuyanlei >Priority: Critical > > In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX > =280GB). The actual memory usage of Namenode is 479GB > Output via pamp: > Address Perm Offset Device Inode Size Rss Pss > Referenced Anonymous Swap Locked Mapping > 2b42f000 rw-p 00:00 0 294174720 293756960 293756960 > 293756960 293756960 0 0 > 01e21000 rw-p 00:00 0 195245456 195240848 195240848 > 195240848 195240848 0 0 [heap] > 2b897c00 rw-p 00:00 0 9246724 9246724 9246724 > 9246724 9246724 0 0 > 2b8bb0905000 rw-p 00:00 0 1781124 1754572 1754572 > 1754572 1754572 0 0 > 2b893600 rw-p 00:00 0 1146880 1002084 1002084 > 1002084 1002084 0 0 > 2b42db652000 rwxp 00:00 0 57792 55252 55252 > 55252 55252 0 0 > 2b42ec12a000 rw-p 00:00 0 25696 24700 24700 > 24700 24700 0 0 > 2b42ef25b000 rw-p 00:00 0 9988 8972 8972 > 8972 8972 0 0 > 2b8c1d467000 rw-p 00:00 0 9216 8204 8204 > 8204 8204 0 0 > 2b8d6f8db000 rw-p 00:00 0 7160 6228 6228 > 6228 6228 0 0 > The first line should configure the memory footprint for XMX, and [heap] is > unusually large, so a memory leak is suspected! > > * [heap] is associated with malloc > After configuring JCMD in the test environment, we found that the malloc part > of Internal in JCMD increased significantly when the client was writing to a > gz file (XMX =40g in the test environment, and the Internal area was 900MB > before the client wrote) : > Total: reserved=47276MB, committed=47070MB > - Java Heap (reserved=40960MB, committed=40960MB) > (mmap: reserved=40960MB, committed=40960MB) > > - Class (reserved=53MB, committed=52MB) > (classes #7423) > (malloc=1MB #17053) > (mmap: reserved=52MB, committed=52MB) > > - Thread (reserved=2145MB, committed=2145MB) > (thread #2129) > (stack: reserved=2136MB, committed=2136MB) > (malloc=7MB #10673) > (arena=2MB #4256) > > - Code (reserved=251MB, committed=45MB) > (malloc=7MB #10661) > (mmap: reserved=244MB, committed=38MB) > > - GC (reserved=2307MB, committed=2307MB) > (malloc=755MB #525664) >
[jira] [Commented] (HDFS-16726) There is a memory-related problem about HDFS namenode
[ https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578831#comment-17578831 ] yuyanlei commented on HDFS-16726: - G1RSetRegionEntries default to 256 and cluster environment G1RSetRegionEntries are 4096. Increasing G1RSetRegionEntries solves the problem of too long Scan Rset time, but this solution leads to too much memory outside the process heap. So at the cost of increasing the memory footprint to keep GC stable, > There is a memory-related problem about HDFS namenode > - > > Key: HDFS-16726 > URL: https://issues.apache.org/jira/browse/HDFS-16726 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.7.2 > Environment: -Xms280G -Xmx280G -XX:MaxDirectMemorySize=10G > -XX:MetaspaceSize=128M -server \ > -XX:+UseG1GC -XX:+UseStringDeduplication > -XX:MaxGCPauseMillis=250 -XX:+UnlockExperimentalVMOptions > -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=1 \ > -XX:G1OldCSetRegionThresholdPercent=1 > -XX:G1MixedGCCountTarget=9 -XX:+SafepointTimeout > -XX:SafepointTimeoutDelay=4000 \ > -XX:ParallelGCThreads=24 -XX:ConcGCThreads=6 > -XX:G1RSetRegionEntries=4096 -XX:+AggressiveOpts -XX:+DisableExplicitGC \ > -XX:G1HeapWastePercent=9 > -XX:G1MixedGCLiveThresholdPercent=85 -XX:InitiatingHeapOccupancyPercent=75 \ > -XX:+ParallelRefProcEnabled -XX:-ResizePLAB > -XX:+PrintAdaptiveSizePolicy \ > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ > -Xloggc:$HADOOP_LOG_DIR/namenode.gc.log \ > -XX:+HeapDumpOnOutOfMemoryError > -XX:ErrorFile=$HADOOP_LOG_DIR/hs_err_pid%p.log > -XX:HeapDumpPath=$HADOOP_LOG_DIR \ > -Dcom.sun.management.jmxremote \ > -Dcom.sun.management.jmxremote.port=9009 \ > -Dcom.sun.management.jmxremote.ssl=false \ > -Dcom.sun.management.jmxremote.authenticate=false \ > $HADOOP_NAMENODE_OPTS >Reporter: yuyanlei >Priority: Critical > Attachments: 图片_lanxin_20220809153722.png > > > In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX > =280GB). The actual memory usage of Namenode is 479GB > Output via pamp: > Address Perm Offset Device Inode Size Rss Pss > Referenced Anonymous Swap Locked Mapping > 2b42f000 rw-p 00:00 0 294174720 293756960 293756960 > 293756960 293756960 0 0 > 01e21000 rw-p 00:00 0 195245456 195240848 195240848 > 195240848 195240848 0 0 [heap] > 2b897c00 rw-p 00:00 0 9246724 9246724 9246724 > 9246724 9246724 0 0 > 2b8bb0905000 rw-p 00:00 0 1781124 1754572 1754572 > 1754572 1754572 0 0 > 2b893600 rw-p 00:00 0 1146880 1002084 1002084 > 1002084 1002084 0 0 > 2b42db652000 rwxp 00:00 0 57792 55252 55252 > 55252 55252 0 0 > 2b42ec12a000 rw-p 00:00 0 25696 24700 24700 > 24700 24700 0 0 > 2b42ef25b000 rw-p 00:00 0 9988 8972 8972 > 8972 8972 0 0 > 2b8c1d467000 rw-p 00:00 0 9216 8204 8204 > 8204 8204 0 0 > 2b8d6f8db000 rw-p 00:00 0 7160 6228 6228 > 6228 6228 0 0 > The first line should configure the memory footprint for XMX, and [heap] is > unusually large, so a memory leak is suspected! > > * [heap] is associated with malloc > After configuring JCMD in the test environment, we found that the malloc part > of Internal in JCMD increased significantly when the client was writing to a > gz file (XMX =40g in the test environment, and the Internal area was 900MB > before the client wrote) : > Total: reserved=47276MB, committed=47070MB > - Java Heap (reserved=40960MB, committed=40960MB) > (mmap: reserved=40960MB, committed=40960MB) > > - Class (reserved=53MB, committed=52MB) > (classes #7423) > (malloc=1MB #17053) > (mmap: reserved=52MB, committed=52MB) > > - Thread (reserved=2145MB, committed=2145MB) > (thread #2129) > (stack: reserved=2136MB, committed=2136MB) > (malloc=7MB #10673) >
[jira] [Commented] (HDFS-16678) RBF supports disable getNodeUsage() in RBFMetrics
[ https://issues.apache.org/jira/browse/HDFS-16678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578791#comment-17578791 ] ASF GitHub Bot commented on HDFS-16678: --- slfan1989 commented on PR #4606: URL: https://github.com/apache/hadoop/pull/4606#issuecomment-1212724094 > @slfan1989 please take another look LGTM. @ZanderXu thanks for your contribution! @goiri @goiri Thanks for helping review the code! > RBF supports disable getNodeUsage() in RBFMetrics > - > > Key: HDFS-16678 > URL: https://issues.apache.org/jira/browse/HDFS-16678 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 2h > Remaining Estimate: 0h > > In our prod environment, we try to collect RBF metrics every 15s through > jmx_exporter. And we found that collection task often failed. > After tracing and found that the collection task is blocked at getNodeUsage() > in RBFMetrics, because it will collection all datanode's usage from > downstream nameservices. This is a very expensive and almost useless > operation. Because in most scenarios, each NameSerivce contains almost the > same DNs. We can get the data usage's from any one nameservices, not from RBF. > So I feel that RBF should supports disable getNodeUsage() in RBFMetrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16703) Enable RPC Timeout for some protocols of NameNode.
[ https://issues.apache.org/jira/browse/HDFS-16703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578790#comment-17578790 ] ASF GitHub Bot commented on HDFS-16703: --- slfan1989 commented on PR #4660: URL: https://github.com/apache/hadoop/pull/4660#issuecomment-1212723170 @ZanderXu Do we need to update the md file to explain what these parameters do? > Enable RPC Timeout for some protocols of NameNode. > -- > > Key: HDFS-16703 > URL: https://issues.apache.org/jira/browse/HDFS-16703 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > When I read some code about protocol, I found that only > ClientNamenodeProtocolPB proxy with RPC timeout, other protocolPB proxy > without RPC timeout, such as RefreshAuthorizationPolicyProtocolPB, > RefreshUserMappingsProtocolPB, RefreshCallQueueProtocolPB, > GetUserMappingsProtocolPB and NamenodeProtocolPB. > > If proxy without rpc timeout, it will be blocked for a long time if the NN > machine crash or bad network during writing or reading with NN. > > So I feel that we should enable RPC timeout for all ProtocolPB. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices
[ https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578789#comment-17578789 ] ASF GitHub Bot commented on HDFS-16728: --- slfan1989 commented on code in PR #4734: URL: https://github.com/apache/hadoop/pull/4734#discussion_r944106740 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NoLocationException.java: ## @@ -0,0 +1,33 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.federation.router; + +import java.io.IOException; + +/** + * This exception is thrown when can not get any mount point for the input path. + * RBF cannot forward any requests for the path. + */ +public class NoLocationException extends IOException { + + private static final long serialVersionUID = 1L; Review Comment: Does this part of the code make sense? Why do we assign a value of 1? > RBF throw IndexOutOfBoundsException with disableNameServices > > > Key: HDFS-16728 > URL: https://issues.apache.org/jira/browse/HDFS-16728 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > RBF will throw an IndexOutOfBoundsException when the namespace is disabled. > Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0. > RBF will throw IndexOutOfBoundsException during handling requests with path > starting with /a/b. > {code:java} > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at > java.util.ArrayList.rangeCheck(ArrayList.java:657) > at java.util.ArrayList.get(ArrayList.java:433) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices
[ https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578788#comment-17578788 ] ASF GitHub Bot commented on HDFS-16728: --- slfan1989 commented on code in PR #4734: URL: https://github.com/apache/hadoop/pull/4734#discussion_r944106740 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NoLocationException.java: ## @@ -0,0 +1,33 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.hadoop.hdfs.server.federation.router; + +import java.io.IOException; + +/** + * This exception is thrown when can not get any mount point for the input path. + * RBF cannot forward any requests for the path. + */ +public class NoLocationException extends IOException { + + private static final long serialVersionUID = 1L; Review Comment: Does this part of the code make sense? > RBF throw IndexOutOfBoundsException with disableNameServices > > > Key: HDFS-16728 > URL: https://issues.apache.org/jira/browse/HDFS-16728 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > RBF will throw an IndexOutOfBoundsException when the namespace is disabled. > Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0. > RBF will throw IndexOutOfBoundsException during handling requests with path > starting with /a/b. > {code:java} > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at > java.util.ArrayList.rangeCheck(ArrayList.java:657) > at java.util.ArrayList.get(ArrayList.java:433) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16705) RBF: Support healthMonitor timeout configurable and cache NN and client proxy in NamenodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-16705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578785#comment-17578785 ] ASF GitHub Bot commented on HDFS-16705: --- slfan1989 commented on PR #4662: URL: https://github.com/apache/hadoop/pull/4662#issuecomment-1212720519 > It looks good from my side but I'd like to get closure from @slfan1989 LGTM. @ZanderXu Thank you for your contribution, @goiri Thank you for helping to review the code. > RBF: Support healthMonitor timeout configurable and cache NN and client proxy > in NamenodeHeartbeatService > - > > Key: HDFS-16705 > URL: https://issues.apache.org/jira/browse/HDFS-16705 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > When I read NamenodeHeartbeatService.class of RBF, I feel that there are > somethings we can do for NamenodeHeartbeatService.class. > * Cache NameNode Protocol and Client Protocol to avoid creating a new proxy > every time > * Supports healthMonitorTimeout configuration > * Format code of getNamenodeStatusReport to make it clearer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer
[ https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578782#comment-17578782 ] ASF GitHub Bot commented on HDFS-16689: --- ferhui commented on PR #4628: URL: https://github.com/apache/hadoop/pull/4628#issuecomment-1212715520 @ZanderXu Thanks for your contribution! Merged > Standby NameNode crashes when transitioning to Active with in-progress tailer > - > > Key: HDFS-16689 > URL: https://issues.apache.org/jira/browse/HDFS-16689 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Standby NameNode crashes when transitioning to Active with a in-progress > tailer. And the error message like blew: > {code:java} > Caused by: java.lang.IllegalStateException: Cannot start writing at txid X > when there is a stream available for read: ByteStringEditLog[X, Y], > ByteStringEditLog[X, 0] > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132) > ... 36 more > {code} > After tracing and found there is a critical bug in > *EditlogTailer#catchupDuringFailover()* when > *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* > try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. > It may cannot replay any edits when they are some abnormal JournalNodes. > Reproduce method, suppose: > - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode > is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, > JN1 and JN2. > - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully > synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or > restarted. > - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover > active from NN0 to NN1. > - NN1 only got two responses from JN0 and JN1 when it try to selecting > inputStreams with *fromTxnId=3* and *onlyDurableTxns=true*, and the count > txid of response is 0, 3 respectively. JN2 is abnormal, such as GC, bad > network or restarted. > - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes > because the *maxAllowedTxns* is 0. > So I think Standby NameNode should *catchupDuringFailover()* with > *onlyDurableTxns=false* , so that it can replay all missed edits from > JournalNode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer
[ https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei resolved HDFS-16689. Fix Version/s: 3.4.0 Resolution: Fixed > Standby NameNode crashes when transitioning to Active with in-progress tailer > - > > Key: HDFS-16689 > URL: https://issues.apache.org/jira/browse/HDFS-16689 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 50m > Remaining Estimate: 0h > > Standby NameNode crashes when transitioning to Active with a in-progress > tailer. And the error message like blew: > {code:java} > Caused by: java.lang.IllegalStateException: Cannot start writing at txid X > when there is a stream available for read: ByteStringEditLog[X, Y], > ByteStringEditLog[X, 0] > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132) > ... 36 more > {code} > After tracing and found there is a critical bug in > *EditlogTailer#catchupDuringFailover()* when > *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* > try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. > It may cannot replay any edits when they are some abnormal JournalNodes. > Reproduce method, suppose: > - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode > is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, > JN1 and JN2. > - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully > synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or > restarted. > - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover > active from NN0 to NN1. > - NN1 only got two responses from JN0 and JN1 when it try to selecting > inputStreams with *fromTxnId=3* and *onlyDurableTxns=true*, and the count > txid of response is 0, 3 respectively. JN2 is abnormal, such as GC, bad > network or restarted. > - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes > because the *maxAllowedTxns* is 0. > So I think Standby NameNode should *catchupDuringFailover()* with > *onlyDurableTxns=false* , so that it can replay all missed edits from > JournalNode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer
[ https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578778#comment-17578778 ] ASF GitHub Bot commented on HDFS-16689: --- ferhui merged PR #4628: URL: https://github.com/apache/hadoop/pull/4628 > Standby NameNode crashes when transitioning to Active with in-progress tailer > - > > Key: HDFS-16689 > URL: https://issues.apache.org/jira/browse/HDFS-16689 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Critical > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Standby NameNode crashes when transitioning to Active with a in-progress > tailer. And the error message like blew: > {code:java} > Caused by: java.lang.IllegalStateException: Cannot start writing at txid X > when there is a stream available for read: ByteStringEditLog[X, Y], > ByteStringEditLog[X, 0] > at > org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:344) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogAsync.openForWrite(FSEditLogAsync.java:113) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1423) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:2132) > ... 36 more > {code} > After tracing and found there is a critical bug in > *EditlogTailer#catchupDuringFailover()* when > *DFS_HA_TAILEDITS_INPROGRESS_KEY* is true. Because *catchupDuringFailover()* > try to replay all missed edits from JournalNodes with *onlyDurableTxns=true*. > It may cannot replay any edits when they are some abnormal JournalNodes. > Reproduce method, suppose: > - There are 2 namenode, namely NN0 and NN1, and the status of echo namenode > is Active, Standby respectively. And there are 3 JournalNodes, namely JN0, > JN1 and JN2. > - NN0 try to sync 3 edits to JNs with started txid 3, but only successfully > synced them to JN1 and JN3. And JN0 is abnormal, such as GC, bad network or > restarted. > - NN1's lastAppliedTxId is 2, and at the moment, we are trying failover > active from NN0 to NN1. > - NN1 only got two responses from JN0 and JN1 when it try to selecting > inputStreams with *fromTxnId=3* and *onlyDurableTxns=true*, and the count > txid of response is 0, 3 respectively. JN2 is abnormal, such as GC, bad > network or restarted. > - NN1 will cannot replay any Edits with *fromTxnId=3* from JournalNodes > because the *maxAllowedTxns* is 0. > So I think Standby NameNode should *catchupDuringFailover()* with > *onlyDurableTxns=false* , so that it can replay all missed edits from > JournalNode. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16726) There is a memory-related problem about HDFS namenode
[ https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuyanlei updated HDFS-16726: Environment: -Xms280G -Xmx280G -XX:MaxDirectMemorySize=10G -XX:MetaspaceSize=128M -server \ -XX:+UseG1GC -XX:+UseStringDeduplication -XX:MaxGCPauseMillis=250 -XX:+UnlockExperimentalVMOptions -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 \ -XX:G1OldCSetRegionThresholdPercent=1 -XX:G1MixedGCCountTarget=9 -XX:+SafepointTimeout -XX:SafepointTimeoutDelay=4000 \ -XX:ParallelGCThreads=24 -XX:ConcGCThreads=6 -XX:G1RSetRegionEntries=4096 -XX:+AggressiveOpts -XX:+DisableExplicitGC \ -XX:G1HeapWastePercent=9 -XX:G1MixedGCLiveThresholdPercent=85 -XX:InitiatingHeapOccupancyPercent=75 \ -XX:+ParallelRefProcEnabled -XX:-ResizePLAB -XX:+PrintAdaptiveSizePolicy \ -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ -Xloggc:$HADOOP_LOG_DIR/namenode.gc.log \ -XX:+HeapDumpOnOutOfMemoryError -XX:ErrorFile=$HADOOP_LOG_DIR/hs_err_pid%p.log -XX:HeapDumpPath=$HADOOP_LOG_DIR \ -Dcom.sun.management.jmxremote \ -Dcom.sun.management.jmxremote.port=9009 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false \ $HADOOP_NAMENODE_OPTS > There is a memory-related problem about HDFS namenode > - > > Key: HDFS-16726 > URL: https://issues.apache.org/jira/browse/HDFS-16726 > Project: Hadoop HDFS > Issue Type: Bug > Components: hdfs, namenode >Affects Versions: 2.7.2 > Environment: -Xms280G -Xmx280G -XX:MaxDirectMemorySize=10G > -XX:MetaspaceSize=128M -server \ > -XX:+UseG1GC -XX:+UseStringDeduplication > -XX:MaxGCPauseMillis=250 -XX:+UnlockExperimentalVMOptions > -XX:+PrintGCApplicationStoppedTime -XX:+PrintSafepointStatistics > -XX:PrintSafepointStatisticsCount=1 \ > -XX:G1OldCSetRegionThresholdPercent=1 > -XX:G1MixedGCCountTarget=9 -XX:+SafepointTimeout > -XX:SafepointTimeoutDelay=4000 \ > -XX:ParallelGCThreads=24 -XX:ConcGCThreads=6 > -XX:G1RSetRegionEntries=4096 -XX:+AggressiveOpts -XX:+DisableExplicitGC \ > -XX:G1HeapWastePercent=9 > -XX:G1MixedGCLiveThresholdPercent=85 -XX:InitiatingHeapOccupancyPercent=75 \ > -XX:+ParallelRefProcEnabled -XX:-ResizePLAB > -XX:+PrintAdaptiveSizePolicy \ > -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps \ > -Xloggc:$HADOOP_LOG_DIR/namenode.gc.log \ > -XX:+HeapDumpOnOutOfMemoryError > -XX:ErrorFile=$HADOOP_LOG_DIR/hs_err_pid%p.log > -XX:HeapDumpPath=$HADOOP_LOG_DIR \ > -Dcom.sun.management.jmxremote \ > -Dcom.sun.management.jmxremote.port=9009 \ > -Dcom.sun.management.jmxremote.ssl=false \ > -Dcom.sun.management.jmxremote.authenticate=false \ > $HADOOP_NAMENODE_OPTS >Reporter: yuyanlei >Priority: Critical > Attachments: 图片_lanxin_20220809153722.png > > > In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX > =280GB). The actual memory usage of Namenode is 479GB > Output via pamp: > Address Perm Offset Device Inode Size Rss Pss > Referenced Anonymous Swap Locked Mapping > 2b42f000 rw-p 00:00 0 294174720 293756960 293756960 > 293756960 293756960 0 0 > 01e21000 rw-p 00:00 0 195245456 195240848 195240848 > 195240848 195240848 0 0 [heap] > 2b897c00 rw-p 00:00 0 9246724 9246724 9246724 > 9246724 9246724 0 0 > 2b8bb0905000 rw-p 00:00 0 1781124 1754572 1754572 > 1754572 1754572 0 0 > 2b893600 rw-p 00:00 0 1146880 1002084 1002084 > 1002084 1002084 0 0 > 2b42db652000 rwxp 00:00 0 57792 55252 55252 > 55252 55252 0 0 > 2b42ec12a000 rw-p 00:00 0 25696 24700 24700 > 24700 24700 0 0 > 2b42ef25b000 rw-p 00:00 0 9988 8972 8972 > 8972 8972 0 0 > 2b8c1d467000 rw-p 00:00 0 9216 8204 8204 > 8204 8204 0 0 > 2b8d6f8db000 rw-p 00:00 0 7160 6228 6228 > 6228 6228 0 0 > The first line should configure the memory footprint for XMX, and [heap] is > un
[jira] [Commented] (HDFS-16686) GetJournalEditServlet fails to authorize valid Kerberos request
[ https://issues.apache.org/jira/browse/HDFS-16686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578738#comment-17578738 ] ASF GitHub Bot commented on HDFS-16686: --- hadoop-yetus commented on PR #4724: URL: https://github.com/apache/hadoop/pull/4724#issuecomment-1212634668 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 36s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 1 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 25s | | trunk passed | | +1 :green_heart: | compile | 1m 43s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 38s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 26s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 45s | | trunk passed | | +1 :green_heart: | javadoc | 1m 25s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 48s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 39s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 13s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 23s | | the patch passed | | +1 :green_heart: | compile | 1m 26s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 26s | | the patch passed | | +1 :green_heart: | compile | 1m 20s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 20s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 2s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4724/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 2 new + 9 unchanged - 0 fixed = 11 total (was 9) | | +1 :green_heart: | mvnsite | 1m 26s | | the patch passed | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 34s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 26s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 35s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 236m 52s | | hadoop-hdfs in the patch passed. | | -1 :x: | asflicense | 1m 15s | [/results-asflicense.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4724/2/artifact/out/results-asflicense.txt) | The patch generated 1 ASF License warnings. | | | | 347m 21s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4724/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4724 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux c39386dba77e 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / ee6c71e1ccc12e0cd37d0843edd497d09c3f13fe | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4724/2/testReport/ | | Max. process+thread count | 3187 (vs. ulim
[jira] [Commented] (HDFS-16688) Unresolved Hosts during startup are not synced by JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578730#comment-17578730 ] ASF GitHub Bot commented on HDFS-16688: --- hadoop-yetus commented on PR #4725: URL: https://github.com/apache/hadoop/pull/4725#issuecomment-1212580227 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 0s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 1s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 2s | | trunk passed | | +1 :green_heart: | compile | 1m 44s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 37s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 26s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 40s | | trunk passed | | +1 :green_heart: | javadoc | 1m 24s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 47s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 47s | | trunk passed | | +1 :green_heart: | shadedclient | 23m 13s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 24s | | the patch passed | | +1 :green_heart: | compile | 1m 27s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 27s | | the patch passed | | +1 :green_heart: | compile | 1m 20s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 20s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 4s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 27s | | the patch passed | | +1 :green_heart: | javadoc | 0m 57s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 32s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 26s | | the patch passed | | +1 :green_heart: | shadedclient | 22m 37s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 426m 52s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/3/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 11s | | The patch does not generate ASF License warnings. | | | | 536m 50s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes | | | hadoop.hdfs.qjournal.client.TestQJMWithFaults | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4725 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 3bed9d08f88f 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:06:04 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 8d44a37f32e423ecb51d340f847b89cd978c6590 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/3/testReport/ |
[jira] [Commented] (HDFS-16724) RBF should support get the information about ancestor mount points
[ https://issues.apache.org/jira/browse/HDFS-16724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578686#comment-17578686 ] ASF GitHub Bot commented on HDFS-16724: --- goiri commented on code in PR #4719: URL: https://github.com/apache/hadoop/pull/4719#discussion_r943951153 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterClientProtocol.java: ## @@ -935,19 +935,22 @@ public BatchedDirectoryListing getBatchedListing(String[] srcs, public HdfsFileStatus getFileInfo(String src) throws IOException { rpcServer.checkOperation(NameNode.OperationCategory.READ); -final List locations = -rpcServer.getLocationsForPath(src, false, false); -RemoteMethod method = new RemoteMethod("getFileInfo", -new Class[] {String.class}, new RemoteParam()); - HdfsFileStatus ret = null; -// If it's a directory, we check in all locations -if (rpcServer.isPathAll(src)) { - ret = getFileInfoAll(locations, method); -} else { - // Check for file information sequentially - ret = rpcClient.invokeSequential( - locations, method, HdfsFileStatus.class, null); +IOException noLocationException = null; +try { + final List locations = rpcServer.getLocationsForPath(src, false, false); + RemoteMethod method = new RemoteMethod("getFileInfo", + new Class[] {String.class}, new RemoteParam()); + + // If it's a directory, we check in all locations + if (rpcServer.isPathAll(src)) { +ret = getFileInfoAll(locations, method); + } else { +// Check for file information sequentially +ret = rpcClient.invokeSequential(locations, method, HdfsFileStatus.class, null); + } +} catch (NoLocationException | RouterResolveException e) { Review Comment: That would be nice. > RBF should support get the information about ancestor mount points > -- > > Key: HDFS-16724 > URL: https://issues.apache.org/jira/browse/HDFS-16724 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > Suppose RBF cluster have 2 nameservices and to mount point as below: > * /user/ns1 -> ns1 -> /user/ns1 > * /usre/ns2 -> ns2 -> /user/ns2 > Suppose we disable default nameservice of the RBF cluster and try to > getFileInfo of the path /user, RBF will throw one IOException to client due > to can not find locations for path /user. > But as this case, RBF should should return one valid response to client, > because /user has two sub mount point ns1 and ns2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16705) RBF: Support healthMonitor timeout configurable and cache NN and client proxy in NamenodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-16705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578681#comment-17578681 ] ASF GitHub Bot commented on HDFS-16705: --- goiri commented on PR #4662: URL: https://github.com/apache/hadoop/pull/4662#issuecomment-1212504496 It is look from my side but I'd like to get closure from @slfan1989 > RBF: Support healthMonitor timeout configurable and cache NN and client proxy > in NamenodeHeartbeatService > - > > Key: HDFS-16705 > URL: https://issues.apache.org/jira/browse/HDFS-16705 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > When I read NamenodeHeartbeatService.class of RBF, I feel that there are > somethings we can do for NamenodeHeartbeatService.class. > * Cache NameNode Protocol and Client Protocol to avoid creating a new proxy > every time > * Supports healthMonitorTimeout configuration > * Format code of getNamenodeStatusReport to make it clearer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Resolved] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error
[ https://issues.apache.org/jira/browse/HDFS-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chao Sun resolved HDFS-16702. - Hadoop Flags: Reviewed Resolution: Fixed > MiniDFSCluster should report cause of exception in assertion error > -- > > Key: HDFS-16702 > URL: https://issues.apache.org/jira/browse/HDFS-16702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Environment: Tests running in the Hadoop dev environment image. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > When the MiniDFSClsuter detects that an exception caused an exit, it should > include that exception as the cause for the AssertionError that it throws. > The current AssertError simply reports the message "Test resulted in an > unexpected exit" and provides a stack trace to the location of the check for > an exit exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error
[ https://issues.apache.org/jira/browse/HDFS-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578667#comment-17578667 ] ASF GitHub Bot commented on HDFS-16702: --- sunchao merged PR #4680: URL: https://github.com/apache/hadoop/pull/4680 > MiniDFSCluster should report cause of exception in assertion error > -- > > Key: HDFS-16702 > URL: https://issues.apache.org/jira/browse/HDFS-16702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Environment: Tests running in the Hadoop dev environment image. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > When the MiniDFSClsuter detects that an exception caused an exit, it should > include that exception as the cause for the AssertionError that it throws. > The current AssertError simply reports the message "Test resulted in an > unexpected exit" and provides a stack trace to the location of the check for > an exit exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error
[ https://issues.apache.org/jira/browse/HDFS-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578668#comment-17578668 ] ASF GitHub Bot commented on HDFS-16702: --- sunchao commented on PR #4680: URL: https://github.com/apache/hadoop/pull/4680#issuecomment-1212481643 Merged to trunk, thanks @snmvaughan . > MiniDFSCluster should report cause of exception in assertion error > -- > > Key: HDFS-16702 > URL: https://issues.apache.org/jira/browse/HDFS-16702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Environment: Tests running in the Hadoop dev environment image. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > When the MiniDFSClsuter detects that an exception caused an exit, it should > include that exception as the cause for the AssertionError that it throws. > The current AssertError simply reports the message "Test resulted in an > unexpected exit" and provides a stack trace to the location of the check for > an exit exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16695) Improve Code with Lambda in org.apahce.hadoop.hdfs.server.namenode package
[ https://issues.apache.org/jira/browse/HDFS-16695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578658#comment-17578658 ] ASF GitHub Bot commented on HDFS-16695: --- hadoop-yetus commented on PR #4668: URL: https://github.com/apache/hadoop/pull/4668#issuecomment-1212469682 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 48s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | -1 :x: | test4tests | 0m 0s | | The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 13s | | trunk passed | | +1 :green_heart: | compile | 1m 42s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 24s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 39s | | trunk passed | | +1 :green_heart: | javadoc | 1m 19s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 44s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 46s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 27s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 24s | | the patch passed | | +1 :green_heart: | compile | 1m 30s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 30s | | the patch passed | | +1 :green_heart: | compile | 1m 19s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 19s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 6s | | hadoop-hdfs-project/hadoop-hdfs: The patch generated 0 new + 777 unchanged - 39 fixed = 777 total (was 816) | | +1 :green_heart: | mvnsite | 1m 26s | | the patch passed | | +1 :green_heart: | javadoc | 0m 58s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 30s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 33s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 18s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 342m 5s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 0m 59s | | The patch does not generate ASF License warnings. | | | | 460m 59s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4668/3/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4668 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux f1e4004c0e15 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5e1e30451f406e776e66135f7d00265c763345a0 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4668/3/testReport/ | | Max. process+thread count | 1961 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-m
[jira] [Resolved] (HDFS-13274) RBF: Extend RouterRpcClient to use multiple sockets
[ https://issues.apache.org/jira/browse/HDFS-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Íñigo Goiri resolved HDFS-13274. Fix Version/s: 3.4.0 Hadoop Flags: Reviewed Resolution: Fixed > RBF: Extend RouterRpcClient to use multiple sockets > --- > > Key: HDFS-13274 > URL: https://issues.apache.org/jira/browse/HDFS-13274 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > Time Spent: 4h 20m > Remaining Estimate: 0h > > HADOOP-13144 introduces the ability to create multiple connections for the > same user and use different sockets. The RouterRpcClient should use this > approach to get a better throughput. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13274) RBF: Extend RouterRpcClient to use multiple sockets
[ https://issues.apache.org/jira/browse/HDFS-13274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578651#comment-17578651 ] ASF GitHub Bot commented on HDFS-13274: --- goiri merged PR #4531: URL: https://github.com/apache/hadoop/pull/4531 > RBF: Extend RouterRpcClient to use multiple sockets > --- > > Key: HDFS-13274 > URL: https://issues.apache.org/jira/browse/HDFS-13274 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: Íñigo Goiri >Assignee: Íñigo Goiri >Priority: Major > Labels: pull-request-available > Time Spent: 4h 20m > Remaining Estimate: 0h > > HADOOP-13144 introduces the ability to create multiple connections for the > same user and use different sockets. The RouterRpcClient should use this > approach to get a better throughput. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16702) MiniDFSCluster should report cause of exception in assertion error
[ https://issues.apache.org/jira/browse/HDFS-16702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578649#comment-17578649 ] ASF GitHub Bot commented on HDFS-16702: --- snmvaughan commented on PR #4680: URL: https://github.com/apache/hadoop/pull/4680#issuecomment-1212454842 I looked into the 2 failed unit tests, and they don't appear to be related to the change. > MiniDFSCluster should report cause of exception in assertion error > -- > > Key: HDFS-16702 > URL: https://issues.apache.org/jira/browse/HDFS-16702 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs > Environment: Tests running in the Hadoop dev environment image. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Minor > Labels: pull-request-available > Time Spent: 3h 40m > Remaining Estimate: 0h > > When the MiniDFSClsuter detects that an exception caused an exit, it should > include that exception as the cause for the AssertionError that it throws. > The current AssertError simply reports the message "Test resulted in an > unexpected exit" and provides a stack trace to the location of the check for > an exit exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices
[ https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578579#comment-17578579 ] ASF GitHub Bot commented on HDFS-16728: --- goiri commented on code in PR #4734: URL: https://github.com/apache/hadoop/pull/4734#discussion_r943742721 ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java: ## @@ -1547,6 +1547,24 @@ public void testRenewLeaseWithMultiStream() throws Exception { } } + @Test + public void testMkdirWithDisableNameService() throws Exception { +MockResolver resolver = (MockResolver)router.getRouter().getSubclusterResolver(); +String ns0 = cluster.getNameservices().get(0); +resolver.addLocation("/mnt", ns0, "/"); +MockResolver activeNamenodeResolver = (MockResolver)router.getRouter().getNamenodeResolver(); +activeNamenodeResolver.disableNamespace(ns0); + +try { + FsPermission permission = new FsPermission("777"); + LambdaTestUtils.intercept(NoLocationException.class, + () -> router.getRouter().getRpcServer() Review Comment: Extract router.getRouter().getRpcServer() and we can make this fit in one line. ## hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcServer.java: ## @@ -1765,6 +1765,9 @@ protected List getLocationsForPath(String path, locs.add(loc); } } + if (locs.isEmpty()) { +throw new NoLocationException(path, this.subclusterResolver.getClass().getSimpleName()); Review Comment: Cleaner to pass this.subclusterResolver.getClass() and do the getSimpleName() in the exception. > RBF throw IndexOutOfBoundsException with disableNameServices > > > Key: HDFS-16728 > URL: https://issues.apache.org/jira/browse/HDFS-16728 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > RBF will throw an IndexOutOfBoundsException when the namespace is disabled. > Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0. > RBF will throw IndexOutOfBoundsException during handling requests with path > starting with /a/b. > {code:java} > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at > java.util.ArrayList.rangeCheck(ArrayList.java:657) > at java.util.ArrayList.get(ArrayList.java:433) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578575#comment-17578575 ] ASF GitHub Bot commented on HDFS-13522: --- simbadzina commented on code in PR #4311: URL: https://github.com/apache/hadoop/pull/4311#discussion_r943730742 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FederatedNamespaceIds.java: ## @@ -0,0 +1,85 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hdfs; + +import java.util.Collections; +import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; +import org.apache.hadoop.classification.VisibleForTesting; +import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos; +import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos.RouterFederatedStateProto; +import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException; +import org.apache.hadoop.thirdparty.protobuf.ByteString; + + +/** Collection of last-seen namespace state Ids for a set of namespaces. */ Review Comment: Added a comment > A single NamespaceStateId is shared by all outgoing connections to a particular namespace. > Router clients share and query the entire collection. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578573#comment-17578573 ] ASF GitHub Bot commented on HDFS-13522: --- simbadzina commented on code in PR #4311: URL: https://github.com/apache/hadoop/pull/4311#discussion_r943730742 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FederatedNamespaceIds.java: ## @@ -0,0 +1,85 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hdfs; + +import java.util.Collections; +import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; +import org.apache.hadoop.classification.VisibleForTesting; +import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos; +import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos.RouterFederatedStateProto; +import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException; +import org.apache.hadoop.thirdparty.protobuf.ByteString; + + +/** Collection of last-seen namespace state Ids for a set of namespaces. */ Review Comment: Added a comment > A single NamespaceStateId is shared by all outgoing connections to a particular namespace. > Router clients query the entire collection. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578569#comment-17578569 ] ASF GitHub Bot commented on HDFS-13522: --- simbadzina commented on code in PR #4311: URL: https://github.com/apache/hadoop/pull/4311#discussion_r943725929 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientGSIContext.java: ## @@ -66,15 +79,23 @@ public void updateResponseState(RpcResponseHeaderProto.Builder header) { */ @Override public void receiveResponseState(RpcResponseHeaderProto header) { -lastSeenStateId.accumulate(header.getStateId()); +lastSeenStateId.update(header.getStateId()); Review Comment: I agree. Fixed. ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/ClientGSIContext.java: ## @@ -37,14 +37,27 @@ @InterfaceStability.Evolving public class ClientGSIContext implements AlignmentContext { - private final LongAccumulator lastSeenStateId = - new LongAccumulator(Math::max, Long.MIN_VALUE); + private final NamespaceStateId lastSeenStateId; + private ByteString routerFederatedState; + + public ClientGSIContext() { +this(new NamespaceStateId()); + } + + public ClientGSIContext(NamespaceStateId lastSeenStateId) { +this.lastSeenStateId = lastSeenStateId; +routerFederatedState = null; + } @Override public long getLastSeenStateId() { return lastSeenStateId.get(); } + public void updateLastSeenStateID(Long stateId) { Review Comment: Removed. > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaining Estimate: 0h > > Changes will need to occur to the router to support the new observer node. > One such change will be to make the router understand the observer state, > e.g. {{FederationNamenodeServiceState}}. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-13522) RBF: Support observer node from Router-Based Federation
[ https://issues.apache.org/jira/browse/HDFS-13522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578570#comment-17578570 ] ASF GitHub Bot commented on HDFS-13522: --- simbadzina commented on code in PR #4311: URL: https://github.com/apache/hadoop/pull/4311#discussion_r943726586 ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FederatedNamespaceIds.java: ## @@ -0,0 +1,85 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hdfs; + +import java.util.Collections; +import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; +import org.apache.hadoop.classification.VisibleForTesting; +import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos; +import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos.RouterFederatedStateProto; +import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException; +import org.apache.hadoop.thirdparty.protobuf.ByteString; + + +/** Collection of last-seen namespace state Ids for a set of namespaces. */ +public class FederatedNamespaceIds { Review Comment: Added locking around iterations and modifications to the hashMap. ## hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/FederatedNamespaceIds.java: ## @@ -0,0 +1,85 @@ +/** + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.hadoop.hdfs; + +import java.util.Collections; +import java.util.Map; +import java.util.concurrent.ConcurrentHashMap; +import org.apache.hadoop.classification.VisibleForTesting; +import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos; +import org.apache.hadoop.ipc.protobuf.RpcHeaderProtos.RouterFederatedStateProto; +import org.apache.hadoop.thirdparty.protobuf.InvalidProtocolBufferException; +import org.apache.hadoop.thirdparty.protobuf.ByteString; + + +/** Collection of last-seen namespace state Ids for a set of namespaces. */ +public class FederatedNamespaceIds { + private final Map namespaceIdMap = new ConcurrentHashMap<>(); + + public void updateStateUsingRequestHeader(RpcHeaderProtos.RpcRequestHeaderProto header) { +if (header.hasRouterFederatedState()) { + RouterFederatedStateProto federatedState = null; + try { +federatedState = RouterFederatedStateProto.parseFrom(header.getRouterFederatedState()); + } catch (InvalidProtocolBufferException e) { +throw new RuntimeException(e); + } + federatedState.getNamespaceStateIdsMap().forEach(this::updateNamespaceState); Review Comment: Added > RBF: Support observer node from Router-Based Federation > --- > > Key: HDFS-13522 > URL: https://issues.apache.org/jira/browse/HDFS-13522 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: federation, namenode >Reporter: Erik Krogen >Assignee: Simbarashe Dzinamarira >Priority: Major > Labels: pull-request-available > Attachments: HDFS-13522.001.patch, HDFS-13522.002.patch, > HDFS-13522_WIP.patch, RBF_ Observer support.pdf, Router+Observer RPC > clogging.png, ShortTerm-Routers+Observer.png, > observer_reads_in_rbf_proposal_simbadzina_v1.pdf, > observer_reads_in_rbf_proposal_simbadzina_v2.pdf > > Time Spent: 20h 50m > Remaini
[jira] [Commented] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices
[ https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578561#comment-17578561 ] ASF GitHub Bot commented on HDFS-16728: --- hadoop-yetus commented on PR #4734: URL: https://github.com/apache/hadoop/pull/4734#issuecomment-1212228190 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 41s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 38m 30s | | trunk passed | | +1 :green_heart: | compile | 1m 2s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 0m 57s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 0m 52s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 2s | | trunk passed | | +1 :green_heart: | javadoc | 1m 9s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 16s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 1m 48s | | trunk passed | | +1 :green_heart: | shadedclient | 21m 21s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 0m 42s | | the patch passed | | +1 :green_heart: | compile | 0m 43s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 0m 43s | | the patch passed | | +1 :green_heart: | compile | 0m 40s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 0m 40s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 0m 26s | | the patch passed | | +1 :green_heart: | mvnsite | 0m 43s | | the patch passed | | +1 :green_heart: | javadoc | 0m 42s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 0m 56s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 1m 26s | | the patch passed | | +1 :green_heart: | shadedclient | 20m 30s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 22m 17s | | hadoop-hdfs-rbf in the patch passed. | | +1 :green_heart: | asflicense | 0m 52s | | The patch does not generate ASF License warnings. | | | | 120m 33s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4734/1/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4734 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 730ae41fa654 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / e21fec18ea8c5b29fe7771272efb31fcbbfd22f9 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4734/1/testReport/ | | Max. process+thread count | 2805 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs-rbf U: hadoop-hdfs-project/hadoop-hdfs-rbf | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4734/1/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated
[jira] [Updated] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices
[ https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated HDFS-16728: -- Labels: pull-request-available (was: ) > RBF throw IndexOutOfBoundsException with disableNameServices > > > Key: HDFS-16728 > URL: https://issues.apache.org/jira/browse/HDFS-16728 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > > RBF will throw an IndexOutOfBoundsException when the namespace is disabled. > Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0. > RBF will throw IndexOutOfBoundsException during handling requests with path > starting with /a/b. > {code:java} > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at > java.util.ArrayList.rangeCheck(ArrayList.java:657) > at java.util.ArrayList.get(ArrayList.java:433) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices
[ https://issues.apache.org/jira/browse/HDFS-16728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578513#comment-17578513 ] ASF GitHub Bot commented on HDFS-16728: --- ZanderXu opened a new pull request, #4734: URL: https://github.com/apache/hadoop/pull/4734 ### Description of PR RBF will throw an IndexOutOfBoundsException when the namespace is disabled. Suppose we have a mount point /a/b -> ns0 -> /a/b and disabled the ns0. RBF will throw IndexOutOfBoundsException during handling requests with path starting with /a/b. ``` java.lang.IndexOutOfBoundsException: Index: 0, Size: 0at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.get(ArrayList.java:433) at org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980) ``` > RBF throw IndexOutOfBoundsException with disableNameServices > > > Key: HDFS-16728 > URL: https://issues.apache.org/jira/browse/HDFS-16728 > Project: Hadoop HDFS > Issue Type: Bug >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > > RBF will throw an IndexOutOfBoundsException when the namespace is disabled. > Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0. > RBF will throw IndexOutOfBoundsException during handling requests with path > starting with /a/b. > {code:java} > java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at > java.util.ArrayList.rangeCheck(ArrayList.java:657) > at java.util.ArrayList.get(ArrayList.java:433) > at > org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Created] (HDFS-16728) RBF throw IndexOutOfBoundsException with disableNameServices
ZanderXu created HDFS-16728: --- Summary: RBF throw IndexOutOfBoundsException with disableNameServices Key: HDFS-16728 URL: https://issues.apache.org/jira/browse/HDFS-16728 Project: Hadoop HDFS Issue Type: Bug Reporter: ZanderXu Assignee: ZanderXu RBF will throw an IndexOutOfBoundsException when the namespace is disabled. Suppose we have a mount point /a/b -> ns0 -> /a/b and we disabled the ns0. RBF will throw IndexOutOfBoundsException during handling requests with path starting with /a/b. {code:java} java.lang.IndexOutOfBoundsException: Index: 0, Size: 0 at java.util.ArrayList.rangeCheck(ArrayList.java:657) at java.util.ArrayList.get(ArrayList.java:433) at org.apache.hadoop.hdfs.server.federation.router.RouterClientProtocol.mkdirs(RouterClientProtocol.java:756) at org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.mkdirs(RouterRpcServer.java:980) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16688) Unresolved Hosts during startup are not synced by JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578498#comment-17578498 ] ASF GitHub Bot commented on HDFS-16688: --- snmvaughan commented on PR #4725: URL: https://github.com/apache/hadoop/pull/4725#issuecomment-1212049220 I was unable to replicate the test failure `TestQJMWithFaults.testUnresolvableHostName` locally. > Unresolved Hosts during startup are not synced by JournalNodes > -- > > Key: HDFS-16688 > URL: https://issues.apache.org/jira/browse/HDFS-16688 > Project: Hadoop HDFS > Issue Type: Improvement > Components: journal-node > Environment: Running in Kubernetes using Java 11, with an HA > configuration. >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Major > Labels: pull-request-available > > During the JournalNode startup, it builds the list of servers in the > JournalNode set, ignoring hostnames that cannot be resolved. In environments > with dynamic IP address allocations this means that the JournalNodeSyncer > will never sync with hosts that aren't resolvable during startup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16705) RBF: Support healthMonitor timeout configurable and cache NN and client proxy in NamenodeHeartbeatService
[ https://issues.apache.org/jira/browse/HDFS-16705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578469#comment-17578469 ] ASF GitHub Bot commented on HDFS-16705: --- ZanderXu commented on PR #4662: URL: https://github.com/apache/hadoop/pull/4662#issuecomment-1211954930 @slfan1989 Ping, master, can help me review patch? > RBF: Support healthMonitor timeout configurable and cache NN and client proxy > in NamenodeHeartbeatService > - > > Key: HDFS-16705 > URL: https://issues.apache.org/jira/browse/HDFS-16705 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > When I read NamenodeHeartbeatService.class of RBF, I feel that there are > somethings we can do for NamenodeHeartbeatService.class. > * Cache NameNode Protocol and Client Protocol to avoid creating a new proxy > every time > * Supports healthMonitorTimeout configuration > * Format code of getNamenodeStatusReport to make it clearer -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16713) Improve Code with Lambda in org.apahce.hadoop.hdfs.server.namenode sub packages
[ https://issues.apache.org/jira/browse/HDFS-16713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578467#comment-17578467 ] ASF GitHub Bot commented on HDFS-16713: --- ZanderXu commented on PR #4674: URL: https://github.com/apache/hadoop/pull/4674#issuecomment-1211953731 @goiri Hi, Master, can help me merge this PR into trunk? Thank you very much. > Improve Code with Lambda in org.apahce.hadoop.hdfs.server.namenode sub > packages > --- > > Key: HDFS-16713 > URL: https://issues.apache.org/jira/browse/HDFS-16713 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 50m > Remaining Estimate: 0h > > Improve Code with Lambda in org.apahce.hadoop.hdfs.server.namenode sub > packages. > For example: > Current logic: > {code:java} > public ListenableFuture getJournaledEdits( > long fromTxnId, int maxTransactions) { > return parallelExecutor.submit( > new Callable() { > @Override > public GetJournaledEditsResponseProto call() throws IOException { > return getProxy().getJournaledEdits(journalId, nameServiceId, > fromTxnId, maxTransactions); > } > }); > } {code} > Improved Code with Lambda: > {code:java} > public ListenableFuture getJournaledEdits( > long fromTxnId, int maxTransactions) { > return parallelExecutor.submit(() -> getProxy().getJournaledEdits( > journalId, nameServiceId, fromTxnId, maxTransactions)); > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16710) Remove redundant throw exceptions in org.apahce.hadoop.hdfs.server.namenode package
[ https://issues.apache.org/jira/browse/HDFS-16710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578466#comment-17578466 ] ASF GitHub Bot commented on HDFS-16710: --- ZanderXu commented on PR #4670: URL: https://github.com/apache/hadoop/pull/4670#issuecomment-1211952616 @goiri Hi, Master, can help me merge this pr into trunk? Thank you very much. > Remove redundant throw exceptions in org.apahce.hadoop.hdfs.server.namenode > package > --- > > Key: HDFS-16710 > URL: https://issues.apache.org/jira/browse/HDFS-16710 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: ZanderXu >Assignee: ZanderXu >Priority: Major > Labels: pull-request-available > Time Spent: 1.5h > Remaining Estimate: 0h > > When I read some class about HDFS NameNode, I found there are many redundant > throw exception in org.apahce.hadoop.hdfs.server.namenode package, such as: > {code:java} > public synchronized void transitionToObserver(StateChangeRequestInfo req) > throws ServiceFailedException, AccessControlException, IOException { > checkNNStartup(); > nn.checkHaStateChange(req); > nn.transitionToObserver(); > } {code} > Because ServiceFailedException and AccessControlException is subClass of > IOException, so I feel that ServiceFailedException and AccessControlException > are redundant, so we can remove it to make code clearer, such as: > {code:java} > public synchronized void transitionToObserver(StateChangeRequestInfo req) > throws IOException { > checkNNStartup(); > nn.checkHaStateChange(req); > nn.transitionToObserver(); > } {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer
[ https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578385#comment-17578385 ] ASF GitHub Bot commented on HDFS-16689: --- hadoop-yetus commented on PR #4628: URL: https://github.com/apache/hadoop/pull/4628#issuecomment-1211814838 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 24s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 1s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 1s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 42m 16s | | trunk passed | | +1 :green_heart: | compile | 1m 44s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 36s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 22s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 45s | | trunk passed | | +1 :green_heart: | javadoc | 1m 25s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 45s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 8s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 21s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 27s | | the patch passed | | +1 :green_heart: | compile | 1m 43s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 43s | | the patch passed | | +1 :green_heart: | compile | 1m 31s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 31s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 9s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 33s | | the patch passed | | +1 :green_heart: | javadoc | 1m 7s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 33s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 57s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 4s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 375m 35s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/5/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 21s | | The patch does not generate ASF License warnings. | | | | 497m 27s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.server.blockmanagement.TestBlockTokenWithShortCircuitRead | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/5/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4628 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 0f0416ed0c69 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5e6adb43e490330c7aa5fd71216308fb0915d3b9 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/5/testReport/ | | Max. process+thread count | 2109 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U:
[jira] [Commented] (HDFS-16689) Standby NameNode crashes when transitioning to Active with in-progress tailer
[ https://issues.apache.org/jira/browse/HDFS-16689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578386#comment-17578386 ] ASF GitHub Bot commented on HDFS-16689: --- hadoop-yetus commented on PR #4628: URL: https://github.com/apache/hadoop/pull/4628#issuecomment-1211815047 :confetti_ball: **+1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 1m 1s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 2 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 42m 24s | | trunk passed | | +1 :green_heart: | compile | 1m 50s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 36s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 23s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 47s | | trunk passed | | +1 :green_heart: | javadoc | 1m 25s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 43s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 4m 11s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 31s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 33s | | the patch passed | | +1 :green_heart: | compile | 1m 40s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 40s | | the patch passed | | +1 :green_heart: | compile | 1m 28s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 28s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | +1 :green_heart: | checkstyle | 1m 9s | | the patch passed | | +1 :green_heart: | mvnsite | 1m 36s | | the patch passed | | +1 :green_heart: | javadoc | 1m 4s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 40s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 52s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 10s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | +1 :green_heart: | unit | 374m 1s | | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 12s | | The patch does not generate ASF License warnings. | | | | 496m 15s | | | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/6/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4628 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets | | uname | Linux 6211aff2a338 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 5e6adb43e490330c7aa5fd71216308fb0915d3b9 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Test Results | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/6/testReport/ | | Max. process+thread count | 2195 (vs. ulimit of 5500) | | modules | C: hadoop-hdfs-project/hadoop-hdfs U: hadoop-hdfs-project/hadoop-hdfs | | Console output | https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4628/6/console | | versions | git=2.25.1 maven=3.6.3 spotbugs=4.2.2 | | Powered by | Apache Yetus 0.14.0 https://yetus.apache.org | This message was automatically generated.
[jira] [Commented] (HDFS-16688) Unresolved Hosts during startup are not synced by JournalNodes
[ https://issues.apache.org/jira/browse/HDFS-16688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578356#comment-17578356 ] ASF GitHub Bot commented on HDFS-16688: --- hadoop-yetus commented on PR #4725: URL: https://github.com/apache/hadoop/pull/4725#issuecomment-1211762044 :broken_heart: **-1 overall** | Vote | Subsystem | Runtime | Logfile | Comment | |::|--:|:|::|:---:| | +0 :ok: | reexec | 0m 50s | | Docker mode activated. | _ Prechecks _ | | +1 :green_heart: | dupname | 0m 0s | | No case conflicting files found. | | +0 :ok: | codespell | 0m 0s | | codespell was not available. | | +0 :ok: | detsecrets | 0m 0s | | detect-secrets was not available. | | +0 :ok: | xmllint | 0m 0s | | xmllint was not available. | | +1 :green_heart: | @author | 0m 0s | | The patch does not contain any @author tags. | | +1 :green_heart: | test4tests | 0m 0s | | The patch appears to include 3 new or modified test files. | _ trunk Compile Tests _ | | +1 :green_heart: | mvninstall | 41m 6s | | trunk passed | | +1 :green_heart: | compile | 1m 42s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | compile | 1m 31s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | checkstyle | 1m 20s | | trunk passed | | +1 :green_heart: | mvnsite | 1m 40s | | trunk passed | | +1 :green_heart: | javadoc | 1m 19s | | trunk passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 45s | | trunk passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 47s | | trunk passed | | +1 :green_heart: | shadedclient | 26m 7s | | branch has no errors when building and testing our client artifacts. | _ Patch Compile Tests _ | | +1 :green_heart: | mvninstall | 1m 22s | | the patch passed | | +1 :green_heart: | compile | 1m 30s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javac | 1m 30s | | the patch passed | | +1 :green_heart: | compile | 1m 21s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | javac | 1m 21s | | the patch passed | | +1 :green_heart: | blanks | 0m 0s | | The patch has no blanks issues. | | -0 :warning: | checkstyle | 1m 1s | [/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/2/artifact/out/results-checkstyle-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs-project/hadoop-hdfs: The patch generated 4 new + 251 unchanged - 0 fixed = 255 total (was 251) | | +1 :green_heart: | mvnsite | 1m 27s | | the patch passed | | +1 :green_heart: | javadoc | 0m 59s | | the patch passed with JDK Private Build-11.0.15+10-Ubuntu-0ubuntu0.20.04.1 | | +1 :green_heart: | javadoc | 1m 33s | | the patch passed with JDK Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | +1 :green_heart: | spotbugs | 3m 34s | | the patch passed | | +1 :green_heart: | shadedclient | 26m 4s | | patch has no errors when building and testing our client artifacts. | _ Other Tests _ | | -1 :x: | unit | 346m 11s | [/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt](https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/2/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt) | hadoop-hdfs in the patch passed. | | +1 :green_heart: | asflicense | 1m 4s | | The patch does not generate ASF License warnings. | | | | 464m 33s | | | | Reason | Tests | |---:|:--| | Failed junit tests | hadoop.hdfs.qjournal.client.TestQJMWithFaults | | Subsystem | Report/Notes | |--:|:-| | Docker | ClientAPI=1.41 ServerAPI=1.41 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-4725/2/artifact/out/Dockerfile | | GITHUB PR | https://github.com/apache/hadoop/pull/4725 | | Optional Tests | dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets xmllint | | uname | Linux 56ccbb36f24e 4.15.0-175-generic #184-Ubuntu SMP Thu Mar 24 17:48:36 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux | | Build tool | maven | | Personality | dev-support/bin/hadoop.sh | | git revision | trunk / 38031b2cb60af2ae9c5ac5f8ccb95feae8235f04 | | Default Java | Private Build-1.8.0_312-8u312-b07-0ubuntu1~20.04-b07 | | Multi-JDK versions | /usr/lib/jvm/java-11-openjdk-amd64:Private Build-11.0.15
[jira] [Assigned] (HDFS-2139) Fast copy for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hui Fei reassigned HDFS-2139: - Assignee: ZanderXu (was: Rituraj) > Fast copy for HDFS. > --- > > Key: HDFS-2139 > URL: https://issues.apache.org/jira/browse/HDFS-2139 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Pritam Damania >Assignee: ZanderXu >Priority: Major > Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, > HDFS-2139.patch, image-2022-08-11-11-48-17-994.png > > Original Estimate: 168h > Remaining Estimate: 168h > > There is a need to perform fast file copy on HDFS. The fast copy mechanism > for a file works as > follows : > 1) Query metadata for all blocks of the source file. > 2) For each block 'b' of the file, find out its datanode locations. > 3) For each block of the file, add an empty block to the namesystem for > the destination file. > 4) For each location of the block, instruct the datanode to make a local > copy of that block. > 5) Once each datanode has copied over its respective blocks, they > report to the namenode about it. > 6) Wait for all blocks to be copied and exit. > This would speed up the copying process considerably by removing top of > the rack data transfers. > Note : An extra improvement, would be to instruct the datanode to create a > hardlink of the block file if we are copying a block on the same datanode -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-2139) Fast copy for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578340#comment-17578340 ] Hui Fei commented on HDFS-2139: --- Glad to receive positive feedbacks, thank you! [~xuzq_zander] is interested in this feature and will assign this ticket to him. We can help review. > Fast copy for HDFS. > --- > > Key: HDFS-2139 > URL: https://issues.apache.org/jira/browse/HDFS-2139 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Pritam Damania >Assignee: Rituraj >Priority: Major > Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, > HDFS-2139.patch, image-2022-08-11-11-48-17-994.png > > Original Estimate: 168h > Remaining Estimate: 168h > > There is a need to perform fast file copy on HDFS. The fast copy mechanism > for a file works as > follows : > 1) Query metadata for all blocks of the source file. > 2) For each block 'b' of the file, find out its datanode locations. > 3) For each block of the file, add an empty block to the namesystem for > the destination file. > 4) For each location of the block, instruct the datanode to make a local > copy of that block. > 5) Once each datanode has copied over its respective blocks, they > report to the namenode about it. > 6) Wait for all blocks to be copied and exit. > This would speed up the copying process considerably by removing top of > the rack data transfers. > Note : An extra improvement, would be to instruct the datanode to create a > hardlink of the block file if we are copying a block on the same datanode -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HDFS-2139) Fast copy for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578340#comment-17578340 ] Hui Fei edited comment on HDFS-2139 at 8/11/22 9:16 AM: [~weichiu] [~ayushtkn] [~pengbei] Glad to receive positive feedbacks, thank you! [~xuzq_zander] is interested in this feature and will assign this ticket to him. We can help review. was (Author: ferhui): Glad to receive positive feedbacks, thank you! [~xuzq_zander] is interested in this feature and will assign this ticket to him. We can help review. > Fast copy for HDFS. > --- > > Key: HDFS-2139 > URL: https://issues.apache.org/jira/browse/HDFS-2139 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Pritam Damania >Assignee: Rituraj >Priority: Major > Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, > HDFS-2139.patch, image-2022-08-11-11-48-17-994.png > > Original Estimate: 168h > Remaining Estimate: 168h > > There is a need to perform fast file copy on HDFS. The fast copy mechanism > for a file works as > follows : > 1) Query metadata for all blocks of the source file. > 2) For each block 'b' of the file, find out its datanode locations. > 3) For each block of the file, add an empty block to the namesystem for > the destination file. > 4) For each location of the block, instruct the datanode to make a local > copy of that block. > 5) Once each datanode has copied over its respective blocks, they > report to the namenode about it. > 6) Wait for all blocks to be copied and exit. > This would speed up the copying process considerably by removing top of > the rack data transfers. > Note : An extra improvement, would be to instruct the datanode to create a > hardlink of the block file if we are copying a block on the same datanode -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Updated] (HDFS-16726) There is a memory-related problem about HDFS namenode
[ https://issues.apache.org/jira/browse/HDFS-16726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] yuyanlei updated HDFS-16726: Description: In the cluster, the memory usage of Namenode exceeds the XMX setting (XMX =280GB). The actual memory usage of Namenode is 479GB Output via pamp: Address Perm Offset Device Inode Size Rss Pss Referenced Anonymous Swap Locked Mapping 2b42f000 rw-p 00:00 0 294174720 293756960 293756960 293756960 293756960 0 0 01e21000 rw-p 00:00 0 195245456 195240848 195240848 195240848 195240848 0 0 [heap] 2b897c00 rw-p 00:00 0 9246724 9246724 9246724 9246724 9246724 0 0 2b8bb0905000 rw-p 00:00 0 1781124 1754572 1754572 1754572 1754572 0 0 2b893600 rw-p 00:00 0 1146880 1002084 1002084 1002084 1002084 0 0 2b42db652000 rwxp 00:00 0 57792 55252 55252 55252 55252 0 0 2b42ec12a000 rw-p 00:00 0 25696 24700 24700 24700 24700 0 0 2b42ef25b000 rw-p 00:00 0 9988 8972 8972 8972 8972 0 0 2b8c1d467000 rw-p 00:00 0 9216 8204 8204 8204 8204 0 0 2b8d6f8db000 rw-p 00:00 0 7160 6228 6228 6228 6228 0 0 The first line should configure the memory footprint for XMX, and [heap] is unusually large, so a memory leak is suspected! * [heap] is associated with malloc After configuring JCMD in the test environment, we found that the malloc part of Internal in JCMD increased significantly when the client was writing to a gz file (XMX =40g in the test environment, and the Internal area was 900MB before the client wrote) : Total: reserved=47276MB, committed=47070MB - Java Heap (reserved=40960MB, committed=40960MB) (mmap: reserved=40960MB, committed=40960MB) - Class (reserved=53MB, committed=52MB) (classes #7423) (malloc=1MB #17053) (mmap: reserved=52MB, committed=52MB) - Thread (reserved=2145MB, committed=2145MB) (thread #2129) (stack: reserved=2136MB, committed=2136MB) (malloc=7MB #10673) (arena=2MB #4256) - Code (reserved=251MB, committed=45MB) (malloc=7MB #10661) (mmap: reserved=244MB, committed=38MB) - GC (reserved=2307MB, committed=2307MB) (malloc=755MB #525664) (mmap: reserved=1552MB, committed=1552MB) - Compiler (reserved=8MB, committed=8MB) (malloc=8MB #8852) - Internal (reserved=1524MB, committed=1524MB) (malloc=1524MB #323482) - Symbol (reserved=12MB, committed=12MB) (malloc=10MB #91715) (arena=2MB #1) - Native Memory Tracking (reserved=16MB, committed=16MB) (tracking overhead=15MB) It is clear that the Internal malloc increases significantly when the client writes, and does not decrease after the client stops writing Through pref, I found some more instances when writing on the client side: Children Self Comm Shared Ob Symbol 0.05% 0.00% java libzip.so [.] Java_java_util_zip_ZipFile_getEntry 0.02% 0.00% java libzip.so [.] Java_java_util_zip_Inflater_inflateBytes Therefore, it is suspected that the compressed write operation of the client may have a memory leak problem Use JCMD to locate the call link to Java_java_util_zip_Inflater_inflateBytes: "ExtensionRefresher" #59 daemon prio=5 os_prio=0 tid=0x2419d000 nid=0x69df runnable [0x2b319d7a] java.lang.Thread.State: RUNNABLE at java.util.zip.Inflater.inflateBytes(Native Method) at java.util.zip.Inflater.inflate(Inflater.java:259) - locked <0x2b278f7b9da8> (a java.util.zip.ZStreamRef) at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:152) at java.io.FilterInputStream.read(FilterInputStream.java:133) at org.apache.xerces.impl.XMLEntityManager$RewindableInputStream.read(Unknown Source) at org.apache.xerces.impl.io.UTF8Reader.read(Unknown Source)
[jira] [Commented] (HDFS-2139) Fast copy for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578322#comment-17578322 ] ZanderXu commented on HDFS-2139: Thanks [~weichiu] [~ayushtkn] [~ferhui] [~pengbei] for your comments. I will prepare a detailed design this weekend, please help me review it after I completed. > Fast copy for HDFS. > --- > > Key: HDFS-2139 > URL: https://issues.apache.org/jira/browse/HDFS-2139 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Pritam Damania >Assignee: Rituraj >Priority: Major > Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, > HDFS-2139.patch, image-2022-08-11-11-48-17-994.png > > Original Estimate: 168h > Remaining Estimate: 168h > > There is a need to perform fast file copy on HDFS. The fast copy mechanism > for a file works as > follows : > 1) Query metadata for all blocks of the source file. > 2) For each block 'b' of the file, find out its datanode locations. > 3) For each block of the file, add an empty block to the namesystem for > the destination file. > 4) For each location of the block, instruct the datanode to make a local > copy of that block. > 5) Once each datanode has copied over its respective blocks, they > report to the namenode about it. > 6) Wait for all blocks to be copied and exit. > This would speed up the copying process considerably by removing top of > the rack data transfers. > Note : An extra improvement, would be to instruct the datanode to create a > hardlink of the block file if we are copying a block on the same datanode -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org
[jira] [Commented] (HDFS-2139) Fast copy for HDFS.
[ https://issues.apache.org/jira/browse/HDFS-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17578296#comment-17578296 ] Bei Peng commented on HDFS-2139: {quote}Many companies backport it into their internal branches and use it. * DistCp supports fastcopy * Implement block based strategy{quote} me too. > Fast copy for HDFS. > --- > > Key: HDFS-2139 > URL: https://issues.apache.org/jira/browse/HDFS-2139 > Project: Hadoop HDFS > Issue Type: New Feature >Reporter: Pritam Damania >Assignee: Rituraj >Priority: Major > Attachments: HDFS-2139-For-2.7.1.patch, HDFS-2139.patch, > HDFS-2139.patch, image-2022-08-11-11-48-17-994.png > > Original Estimate: 168h > Remaining Estimate: 168h > > There is a need to perform fast file copy on HDFS. The fast copy mechanism > for a file works as > follows : > 1) Query metadata for all blocks of the source file. > 2) For each block 'b' of the file, find out its datanode locations. > 3) For each block of the file, add an empty block to the namesystem for > the destination file. > 4) For each location of the block, instruct the datanode to make a local > copy of that block. > 5) Once each datanode has copied over its respective blocks, they > report to the namenode about it. > 6) Wait for all blocks to be copied and exit. > This would speed up the copying process considerably by removing top of > the rack data transfers. > Note : An extra improvement, would be to instruct the datanode to create a > hardlink of the block file if we are copying a block on the same datanode -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org