Sirius created ZOOKEEPER-4712:
---------------------------------
Summary: Follower.shutdown() and Observer.shutdown() do not
correctly shutdown the syncProcessor, which may lead to potential data
inconsistency
Key: ZOOKEEPER-4712
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4712
Project: ZooKeeper
Issue Type: Bug
Components: quorum, server
Affects Versions: 3.8.1, 3.7.1, 3.8.0, 3.7.0, 3.6.3, 3.5.10, 3.6.4
Reporter: Sirius
Follower.shutdown() and Observer.shutdown() do not correctly shutdown the
syncProcessor. It may lead to potential data inconsistency (see Potential Risk).
A follower / observer will invoke syncProcessor.shutdown() in
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown(),
respectively.
However, after the
[fix|https://github.com/apache/zookeeper/commit/efbd660e1c4b90a8f538f2cccb5dcb7094cf9a22]
of ZOOLEEPER-3642, Follower.shutdown() / Observer.shutdown() will not invoke
LearnerZooKeeperServer.shutdown() / ObserverZooKeeperServer.shutdown() anymore.
h4. Call stack
h5. Version 3.8.1 / 3.8.0 / 3.7.1 / 3.7.0 / 3.6.4 / 3.6.3 / 3.5.10 ...
* *(Buggy)*
Observer.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)
* *(Buggy)*
Follower.shutdown()->Learner.shutdown()->ZooKeeperServer.shutdown(boolean)
* (For comparison)
Leader.shutdown(String)->LeaderZooKeeper.shutdown()->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)
h5. For comparison, in version 3.4.X,
*
Observer.shutdown()->Learner.shutdown()->*ObserverZooKeeperServer.shutdown()*->ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)
*
Follower.shutdown()->Learner.shutdown()->*FollowerZooKeeperServer.shutdown()*>ZooKeeperServer.shutdown()->ZooKeeperServer.shutdown(boolean)
h4. Code Details
Take version 3.8.0 as an example. In Follower.shutdown() :
public void shutdown() {
LOG.info("shutdown Follower");
+ // invoke Learner.shutdown()
super.shutdown();
}
In Learner.java:
public void shutdown() {
...
// shutdown previous zookeeper
if (zk != null) {
// If we haven't finished SNAP sync, force fully shutdown
// to avoid potential inconsistency
+ // This will invoke ZooKeeperServer.shutdown(boolean),
+ // which will not shutdown syncProcessor
+ // Before the fix of ZOOLEEPER-3642,
+ // FollowerZooKeeperServer.shutdown() will be invoked here
zk.shutdown(self.getSyncMode().equals(QuorumPeer.SyncMode.SNAP));
}
}
In ZooKeeperServer.java:
public synchronized void shutdown(boolean fullyShutDown) {
...
if (firstProcessor != null) {
+ // For a follower, this will not shutdown its syncProcessor.
firstProcessor.shutdown();
}
...
}
In expectation, Follower.shutdown() should invoke
LearnerZooKeeperServer.shutdown() to shutdown the syncProcessor:
public synchronized void shutdown() {
...
try {
+ // shutdown the syncProcessor here
if (syncProcessor != null) {
syncProcessor.shutdown();
}
} ...
}
Observer.shutdown() has the similar problem.
h4. Potential Risk
When Follower.shutdown() is called, the follower's QuorumPeer thread may update
its lastProcessedZxid for the election before its syncThread drains the pending
requests and flushes them to disk.
In consequence, this lastProcessedZxid is not the latest zxid in its log,
leading to log inconsistency after the SYNC phase. (Similar to the symptoms of
ZOOKEEPER-2845.)
--
This message was sent by Atlassian Jira
(v8.20.10#820010)