Fei Hui created HDFS-14230:
------------------------------
Summary: RBF: Throw StandbyException instead of IOException when
no namenodes available
Key: HDFS-14230
URL: https://issues.apache.org/jira/browse/HDFS-14230
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 3.0.3, 2.9.2, 3.1.1, 3.2.0
Reporter: Fei Hui
Failover usually happens when upgrading namenodes. And there are no active
namenodes within some seconds, Accessing HDFS through router fails at this
moment. This could make jobs failure or hang. Some hive jobs logs are as
follow
{code:java}
2019-01-03 16:12:08,337 Stage-1 map = 100%, reduce = 100%, Cumulative CPU
133.33 sec
MapReduce Total cumulative CPU time: 2 minutes 13 seconds 330 msec
Ended Job = job_1542178952162_24411913
Launching Job 4 out of 6
Exception in thread "Thread-86" java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): No namenode
available under nameservice Cluster3
at
org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.shouldRetry(RouterRpcClient.java:328)
at
org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:488)
at
org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invoke(RouterRpcClient.java:495)
at
org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeMethod(RouterRpcClient.java:385)
at
org.apache.hadoop.hdfs.server.federation.router.RouterRpcClient.invokeSequential(RouterRpcClient.java:760)
at
org.apache.hadoop.hdfs.server.federation.router.RouterRpcServer.getFileInfo(RouterRpcServer.java:1152)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
Caused by:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category READ is not supported in state standby
at
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1804)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1338)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3925)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1014)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:849)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2134)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2130)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1867)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2130)
{code}
Deep into the code. Maybe we can throw StandbyException when no namenodes
available. Client will fail after some retries
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]