Sirius created ZOOKEEPER-4823:
---------------------------------
Summary: Proposal: Update the wiki of Zab 1.0 (Phase 2) to make it
more precise and conform to the implementation
Key: ZOOKEEPER-4823
URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4823
Project: ZooKeeper
Issue Type: Improvement
Reporter: Sirius
As ZooKeeper evolves these years, its code implementation deviates the design
of [Zab 1.0|https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0] in
several aspects.
One critical deviation lies in the _atomic actions_ upon a follower receives
NEWLEADER (see 2.*f* in Phase 2).
The protocol requires that the follower " _*atomically*_ applies the new state
and sets *f*.currentEpoch = _e_". However, the atomicity is not guaranteed with
the current code implementation. Asynchronous logging and committing by
multi-threads with node crash can interrupt this process and lead to possible
data loss (see -ZOOKEEPER-3911-, ZOOKEEPER-4643, ZOOKEEPER-4646,
-ZOOKEEPER-4785-).
On the other hand, to implement atomicity is expensive and affecting
performance. It is reasonable to adopt an implementation without requiring
atomic updates in this step. It is highly recommended to update the design of
Zab without requiring atomicity in Step 2.*f* to better guide the code
implementation.
h3. Update Step 2.*f* by removing the requirement of atomicity
Here provides a possible design of Step 2.*f* in Phase 2 with the removal of
atomicity requirement.
h4. Phase 2: Sync with followers
# *l* ...
# *f* The follower syncs with the leader, but doesn't modify its state until
it receives the NEWLEADER(_e_) packet. Once it receives NEWLEADER(_e_), -_it
atomically applies the new state, and then sets f.currentEpoch = e. It then
sends ACK(e << 32)._-
it executes the following actions sequentially:
*2.1. applies the new state;*
*2.2. sets f.currentEpoch = e;*
*2.3. sends ACK(e << 32).*
# *l* ...
Note:
* To ensure the correctness without requiring atomicity, the follower must
persist and sync the data before it updates its currentEpoch and replies
NEWLEADER ack (See the analysis in ZOOKEEPER-4643 & ZOOKEEPER-4785)
* This new design conforms to the code implementation in current latest code
version (ZooKeeper v3.9.2). This code version has fixed the known data loss
issues that stay unresolved for a long time due to non-atomic executions in
Step 2.*f* , including -ZOOKEEPER-3911-, ZOOKEEPER-4643, ZOOKEEPER-4646 &
-ZOOKEEPER-4785-. (see the code fixes in
[PR-2111|https://github.com/apache/zookeeper/pull/2111] &
[PR-2152|https://github.com/apache/zookeeper/pull/2152]).
* The correctness of this new design has been verified with the TLA+
specifications of Zab at different abstraction levels, including
** [High-level protocol
specification|https://github.com/AlphaCanisMajoris/zookeeper-tla-spec/blob/main/Zab_new.tla]
(developed based on the original [protocol
spec|https://github.com/apache/zookeeper/blob/master/zookeeper-specifications/protocol-spec/Zab.tla])
** [Multi-threading-level
specification|https://github.com/AlphaCanisMajoris/zookeeper-tla-spec/blob/main/zk_pr_2152.tla]
(developed based on the original [system
spec.|https://github.com/apache/zookeeper/blob/master/zookeeper-specifications/system-spec/zk-3.7/ZkV3_7_0.tla]
This spec is corresponding to
[PR-2152|https://github.com/apache/zookeeper/pull/2152], an effort to fix more
known issues in Phase 2.)
In the verification, the TLC model checker checks whether the new design
satisfies the properties given by the Zab paper. No violation is found during
the checking with various configurations.
We sincerely hope that the above update of the protocol design can be presented
at the wiki page, and make it guide the future code implementation better!
About us:
We are a research team using TLA+ to verify the correctness of distributed
systems.
Looking forward to receiving feedback from the ZooKeeper community!
--
This message was sent by Atlassian Jira
(v8.20.10#820010)