Harsh J created HADOOP-13515:
--------------------------------

             Summary: Redundant transitionToActive call can cause a NameNode to 
crash
                 Key: HADOOP-13515
                 URL: https://issues.apache.org/jira/browse/HADOOP-13515
             Project: Hadoop Common
          Issue Type: Bug
          Components: ha
    Affects Versions: 2.5.0
            Reporter: Harsh J
            Priority: Minor


The situation in parts is similar to HADOOP-8217, but the cause is different 
and so is the result.

Consider this situation:

- At the beginning NN1 is Active, NN2 is Standby
- ZKFC1 faces a ZK disconnect (not a session timeout, just a socket disconnect) 
and thereby reconnects

{code}
2016-08-11 07:00:46,068 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 4000ms for sessionid 
0x4566f0c97500bd9, closing socket connection and attempting reconnect
2016-08-11 07:00:46,169 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session 
disconnected. Entering neutral mode...
…
2016-08-11 07:00:46,610 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session 
connected.
{code}

- The reconnection on the ZKFC1 triggers the elector code, and the elector 
re-run finds that NN1 should be the new active (a redundant decision cause NN1 
is already active)

{code}
2016-08-11 07:00:46,615 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
Checking for any old active which needs to be fenced...
2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old 
node exists: …
2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: But old 
node has our own data, so don't need to fence it.
{code}

- The ZKFC1 sets the new ZK data, and fires a NN1 RPC call of transitionToActive

{code}
2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing 
znode /hadoop-ha/nameservice1/ActiveBreadCrumb to indicate that the local node 
is the most recent active...
2016-08-11 07:00:46,649 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 175: 
Call -> nn01/10.10.10.10:8022: transitionToActive {reqInfo { reqSource: 
REQUEST_BY_ZKFC }}
{code}

- At the same time as the transitionToActive call is in progress at NN1, but 
not complete yet, the ZK session of ZKFC1 is timed out by ZK Quorum, and a 
watch notification is sent to ZKFC2

{code}
2016-08-11 07:01:00,003 DEBUG org.apache.zookeeper.ClientCnxn: Got notification 
sessionid:0x4566f0c97500bde
2016-08-11 07:01:00,004 DEBUG org.apache.zookeeper.ClientCnxn: Got WatchedEvent 
state:SyncConnected type:NodeDeleted 
path:/hadoop-ha/nameservice1/ActiveStandbyElectorLock for sessionid 
0x4566f0c97500bde
{code}

- ZKFC2 responds by marking NN2 as standby, which succeeds (NN hasn't handled 
transitionToActive call yet due to busy status, but has handled 
transitionToStandby before it)

{code}
2016-08-11 07:01:00,013 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
Checking for any old active which needs to be fenced...
2016-08-11 07:01:00,018 INFO org.apache.hadoop.ha.ZKFailoverController: Should 
fence: NameNode at nn01/10.10.10.10:8022
2016-08-11 07:01:00,020 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 412: 
Call -> nn01/10.10.10.10:8022: transitionToStandby {reqInfo { reqSource: 
REQUEST_BY_ZKFC }}
2016-08-11 07:01:03,880 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: 
transitionToStandby took 3860ms
{code}

- ZKFC2 then marks NN2 as active, and NN2 begins its transition (is in midst of 
it, not done yet at this point)

{code}
2016-08-11 07:01:03,894 INFO org.apache.hadoop.ha.ZKFailoverController: Trying 
to make NameNode at nn02/11.11.11.11:8022 active...
2016-08-11 07:01:03,895 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 412: 
Call -> nn02/11.11.11.11:8022: transitionToActive {reqInfo { reqSource: 
REQUEST_BY_ZKFC }}
…
{code}

{code}
2016-08-11 07:01:09,558 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required 
for active state
…
2016-08-11 07:01:19,968 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
edit logs at txnid 5635
{code}

- At the same time in parallel NN1 processes the transitionToActive requests 
finally, and becomes active

{code}
2016-08-11 07:01:13,281 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required 
for active state
…
2016-08-11 07:01:19,599 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
edit logs at txnid 5635
…
2016-08-11 07:01:19,602 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Starting log segment at 5635
{code}

- NN2's active transition fails as a result of this parallel active transition 
on NN1 which has completed right before it tries to take over

{code}
2016-08-11 07:01:19,968 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
edit logs at txnid 5635
2016-08-11 07:01:22,799 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: 
Error encountered requiring NN shutdown. Shutting down immediately.
java.lang.IllegalStateException: Cannot start writing at txid 5635 when there 
is a stream available for read: 
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@53feee4f
        at 
org.apache.hadoop.hdfs.server.namenode.FSEditLog.openForWrite(FSEditLog.java:312)
        at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startActiveServices(FSNamesystem.java:1174)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.startActiveServices(NameNode.java:1654)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.ActiveState.enterState(ActiveState.java:61)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:64)
        at 
org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1527)
        at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.transitionToActive(NameNodeRpcServer.java:1302)
        at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.transitionToActive(HAServiceProtocolServerSideTranslatorPB.java:107)
        at 
org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4460)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:587)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1026)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1642)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
{code}

(Reflects as a failure on the ZKFC2 too):
{code}
2016-08-11 07:01:31,598 DEBUG org.apache.hadoop.ipc.Client: closing ipc 
connection to nn02/11.11.11.11:8022: null
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1055)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:950)
2016-08-11 07:01:31,626 FATAL org.apache.hadoop.ha.ZKFailoverController: 
Couldn't make NameNode at nn02/11.11.11.11:8022 active
java.io.IOException: Failed on local exception: java.io.EOFException; Host 
Details : local host is: "nn02/11.11.11.11"; destination host is: "nn02":8022;
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
        at org.apache.hadoop.ipc.Client.call(Client.java:1415)
        at org.apache.hadoop.ipc.Client.call(Client.java:1364)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
        at com.sun.proxy.$Proxy15.transitionToActive(Unknown Source)
        at 
org.apache.hadoop.ha.protocolPB.HAServiceProtocolClientSideTranslatorPB.transitionToActive(HAServiceProtocolClientSideTranslatorPB.java:100)
        at 
org.apache.hadoop.ha.HAServiceProtocolHelper.transitionToActive(HAServiceProtocolHelper.java:48)
        at 
org.apache.hadoop.ha.ZKFailoverController.becomeActive(ZKFailoverController.java:377)
        at 
org.apache.hadoop.ha.ZKFailoverController.access$900(ZKFailoverController.java:60)
        at 
org.apache.hadoop.ha.ZKFailoverController$ElectorCallbacks.becomeActive(ZKFailoverController.java:868)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:804)
        at 
org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:415)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:599)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:498)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1055)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:950)
2016-08-11 07:01:31,665 WARN org.apache.hadoop.ha.ActiveStandbyElector: 
Exception handling the winning of election
org.apache.hadoop.ha.ServiceFailedException: Couldn't transition to active
{code}

We should perhaps avoid sending an active RPC if the NN is already active, by 
checking the service status before firing the transition RPC inside 
ZKFailoverController.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Reply via email to