[ https://issues.apache.org/jira/browse/HDFS-14230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16871597#comment-16871597 ]
Hudson commented on HDFS-14230: ------------------------------- FAILURE: Integrated in Jenkins build Hadoop-trunk-Commit #16813 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/16813/]) HDFS-14230. RBF: Throw RetriableException instead of IOException when no (brahma: rev 7e63e37dc5cbe330082a6a42598ffb76e0770fc1) * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/FederationTestUtils.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcMonitor.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/FederationRPCMetrics.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/FederationRPCPerformanceMonitor.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRPCClientRetries.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/FederationRPCMBean.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterClientRejectOverload.java * (add) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/NoNamenodesAvailableException.java * (edit) hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RouterRpcClient.java > RBF: Throw RetriableException instead of IOException when no namenodes > available > -------------------------------------------------------------------------------- > > Key: HDFS-14230 > URL: https://issues.apache.org/jira/browse/HDFS-14230 > Project: Hadoop HDFS > Issue Type: Sub-task > Affects Versions: 3.2.0, 3.1.1, 2.9.2, 3.0.3 > Reporter: Fei Hui > Assignee: Fei Hui > Priority: Major > Fix For: HDFS-13891 > > Attachments: HDFS-14230-HDFS-13891.001.patch, > HDFS-14230-HDFS-13891.002.patch, HDFS-14230-HDFS-13891.003.patch, > HDFS-14230-HDFS-13891.004.patch, HDFS-14230-HDFS-13891.005.patch, > HDFS-14230-HDFS-13891.006.patch > > > Failover usually happens when upgrading namenodes. And there are no active > namenodes within some seconds, Accessing HDFS through router fails at this > moment. This could make jobs failure or hang. Some hive jobs logs are as > follow > {code:java} > 2019-01-03 16:12:08,337 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 133.33 sec > MapReduce Total cumulative CPU time: 2 minutes 13 seconds 330 msec > Ended Job = job_1542178952162_24411913 > Launching Job 4 out of 6 > Exception in thread "Thread-86" java.lang.RuntimeException: > org.apache.hadoop.ipc.RemoteException(java.io.IOException): No namenode > available under nameservice Cluster3 > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.shouldRetry(RouterRpcClient.java:328) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:488) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:495) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:385) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:760) > at > org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > Caused by: > org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): > Operation category READ is not supported in state standby > at > org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87) > at > org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1804) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1338) > at > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3925) > at > org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1014) > at > org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849) > at > org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130) > {code} > Deep into the code. Maybe we can throw StandbyException when no namenodes > available. Client will fail after some retries -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org