Haoze Wu created ZOOKEEPER-4203:
-----------------------------------

             Summary: Leader swallows the ZooKeeperServer.State.ERROR from 
Leader.LearnerCnxAcceptor in some concurrency condition.
                 Key: ZOOKEEPER-4203
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4203
             Project: ZooKeeper
          Issue Type: Bug
          Components: server
    Affects Versions: 3.6.2
            Reporter: Haoze Wu


We found a bug similar to https://issues.apache.org/jira/browse/ZOOKEEPER-2029. 
Leader.LearnerCnxAcceptor is a ZooKeeperCriticalThread and its exception is 
handled by ZooKeeperCriticalThread#handleException, which is supposed to 
shutdown the process or rejoin the quorum. However, in some concurrency 
condition, this correct handling does not occur, and thus this leader without 
LearnerCnxAcceptor will not accept any follower, which is the same symptom as 
https://issues.apache.org/jira/browse/ZOOKEEPER-2029. Ater reproduction, we 
confirmed that this bug exists in both the 3.6.2 release version and the master 
branch.

 

This concurrency condition can be constructed as follows: start a ZooKeeper 
cluster of 3 nodes, and the ServerSocket#accept invocation throws an exception 
in the LearnerCnxAcceptorHandler for the second follower that tries to join the 
quorum. If these 3 nodes get started almost simultaneously, e.g., within 2 
seconds, then The aforementioned symptom may occur. In the log, we can observe 
that one of the followers keeps trying to join the quorum and keeps failing, 
and the other 2 servers always get the request from this follower but never let 
it join the quorum.

 

We prepared the reproduction scripts in a gist link, and the throwing exception 
of ServerSocket#accept is done by the Byteman injection script 
`serverSocketAccept-exception.btm`.

 

The root cause of this bug is that when the Leader.LearnerCnxAcceptor thread 
handles an exception in ZooKeeperCriticalThread#handleException, the exception 
is not really "handled", meaning that it basically does nothing but set the 
ZooKeeperServerState as ERROR in the following stack trace: 
ZooKeeperCriticalThread#handleException -> 
ZooKeeperServerListenerImpl#notifyStopping -> LeaderZooKeeperServer#setState.

 

LeaderZooKeeperServer#setState is implemented in 
QuorumZooKeeperServer#setState, basically setting the state as 
ZooKeeperServer.State.ERROR in this scenario.

 

Under normal Circumstances, this ERROR state will be detected by another thread 
in the following stack trace: QuorumPeer#run -> Leader#lead -> Leader#isRunning 
-> ZooKeeperServer#isRunning. In the code, it is within the infinite while loop 
at the end of Leader#lead 
([https://github.com/apache/zookeeper/blob/release-3.6.2/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L758]).

 

If the ERROR state from Leader.LearnerCnxAcceptor or any other 
ZooKeeperCriticalThread could be detected in this way, then the leader would be 
aware of the ERROR state and turn to 
[https://github.com/apache/zookeeper/blob/release-3.6.2/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L779],
 which is correct.

 

However, in some concurrency condition, the ERROR state can't be detected in 
this way, because in Leader#lead, 
[https://github.com/apache/zookeeper/blob/release-3.6.2/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L685]
 can cover this ERROR state with a RUNNING state in the following stack trace: 
Leader#lead -> Leader#startZkServer -> LeaderZooKeeperServer#startup -> 
ZooKeeperServer#startup -> ZooKeeperServer#setState. Therefore, if the ERROR 
state occurs before the invocation of Leader#startZkServer in Leader#lead, then 
this ERROR state will be covered, because ZooKeeperServer#setState does not 
record or handle the old state.

 

The reproduction script we provided can construct this concurrency condition, 
because, usually, after the server startup, it takes some time for the 
QuorumPeer thread to reach 
[https://github.com/apache/zookeeper/blob/release-3.6.2/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Leader.java#L685].
 Thus the exception LearnerCnxAcceptorHandler, if any, usually occurs earlier 
than that. Then the aforementioned symptom happens.

 

In terms of the fix, we will send a pull request. Basically we just add a 
sanity check to prevent the RUNNING/INITIAL state set in the ZK startup from 
covering the possible ERROR state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to