Sirius created ZOOKEEPER-4685:
---------------------------------
Summary: Unnecessary system unavailability due to Leader shutdown
when follower sent ACK of PROPOSAL before sending ACK of NEWLEADER in log
recovery
Key: ZOOKEEPER-4685
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4685
Project: ZooKeeper
Issue Type: Bug
Components: quorum, server
Affects Versions: 3.8.1, 3.7.1, 3.8.0, 3.7.0, 3.6.3
Reporter: Sirius
When a follower is processing the NEWLEADER message in SYNC phase, it will call
{{logRequest(..)}} to submit the txn persistence task to the
SyncRequestProcessor thread. The SyncRequestProcessor thread may persist txns
and reply ACK of that txn before replying ACK-LD (i.e. ACK of NEWLEADER) to the
leader. This may cause the consequence that the leader cannot collect enough
number of ACK-LDs successfully, followed by the leader's shutdown and a new
round of election. This introduces unnecessary recovery procedures, consumes
extra time before servers get into the BROADCAST phase and reduces the
service's availability a lot.
The following trace can be generated in the latest version nowadays.
h2. Trace
Start the ensemble with three nodes: S{+}0{+}, +S1+ & {+}S2{+}.
- +S2+ is elected leader.
- +S2+ logs a new txn <1, 1> and makes a broadcast.
- +S0+ restarts & +S1+ crashes before receiving the proposal of <1, 1>.
- +S2+ is elected leader again.
- +S2+ syncs with +S0+ using DIFF, and sends the proposal of <1, 1> during
SYNC.
- After +S0+ receives NEWLEADER, {+}S0{+}'s sync thread may persist the txn
<1, 1> and reply corresponding ACK to the leader +S2+ before {+}S0{+}'s
QuorumPeer thread replies ACK-LD to the leader +S2+ .(This is possible because
txn logging is processed asynchronously by Sync thread! )
- The corresponding learnerHandler on +S2+ cannot recognize the ACK of some
proposal before ACK-LD, and is going to be blocked at _waitForStartup()_ until
the leader turn its state to {_}state.RUNNING{_}.
- However, the quorumPeer thread of the leader +S2+ cannot receive enough
number of ACK-LD, and then throws _InterruptedException_ during
{_}waitForNewLeaderAck(..){_}.
- After that, the leader will shutdown and a new round of election is raised,
which consumes extra time for establishing the quorum and reduces availability
a lot.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)