AlphaCanisMajoris commented on PR #2152: URL: https://github.com/apache/zookeeper/pull/2152#issuecomment-2028702977
> One curiosity: > > > We have leveraged the TLA+ specifications of ZooKeeper and verified the correctness of this fix. > > Who is 'we' ? We are a research team dedicated to the verification of distributed systems using TLA+. We participated in the development of the TLA+ [specifications](https://github.com/apache/zookeeper/tree/master/zookeeper-specifications) for Zab & ZooKeeper. Based on ZooKeeper's [existing specification](https://github.com/apache/zookeeper/blob/master/zookeeper-specifications/system-spec/zk-3.7/ZkV3_7_0.tla), we refine the actions according to the code implementation to explore the possible interleaving of multi-node and multi-threading events with failures. Then we check the specification to see whether it violates some invariants (derived from Zab paper and code) with the model checker. For example, when we specify Learner.syncWithLeader(..) in version 3.9.1, the model checking found the issue traces of [ZOOKEEPER-4643](https://issues.apache.org/jira/browse/ZOOKEEPER-4643), [ZOOKEEPER-4646](https://issues.apache.org/jira/browse/ZOOKEEPER-4646), [ZOOKEEPER-4394](https://issues.apache.org/jira/browse/ZOOKEEPER-4394), [ZOOKEEPER-4685](https://issues.apache.org/jira/browse/ZOOKEEPER-4685) and [ZOOKEEPER-3023](https://issues.apache.org/jira/browse/ZOOKEEPER-3023). When specifying Learner.syncWithLeader(..) in version 3.9.2 (fixed by #2111), the issue traces of [ZOOKEEPER-4394](https://issues.apache.org/jira/browse/ZOOKEEPER-4394) and [ZOOKEEPER-3023](https://issues.apache.org/jira/browse/ZOOKEEPER-3023) can be detected. More details of the specifications can be found in this [repo](https://github.com/AlphaCanisMajoris/zookeeper-tla-spec). > Can you share your tests? We verify this fix with the TLA+ specification [zk_pr_2152](https://github.com/AlphaCanisMajoris/zookeeper-tla-spec/blob/main/zk_pr_2152.tla). Some details in this specification: according to the fix in this PR, the logic when a follower receives NEWLEADER is specified into three actions: - FollowerProcessNEWLEADER_pr2152_1: Take snapshot if needed; persist and commit requests according to "packetsCommitted". - FollowerProcessNEWLEADER_pr2152_2: Update currentEpoch. - FollowerProcessNEWLEADER_pr2152_3: Reply NEWLEADER ack. The verification statistics are provided [here](https://github.com/AlphaCanisMajoris/zookeeper-tla-spec/blob/main/statistics/zk_pr_2152/results.csv). No violation is found during the checking with various configurations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
