AlphaCanisMajoris commented on PR #2152:
URL: https://github.com/apache/zookeeper/pull/2152#issuecomment-2028702977

   > One curiosity:
   > 
   > > We have leveraged the TLA+ specifications of ZooKeeper and verified the 
correctness of this fix.
   > 
   > Who is 'we' ? 
   
   We are a research team dedicated to the verification of distributed systems 
using TLA+.
   We participated in the development of the TLA+ 
[specifications](https://github.com/apache/zookeeper/tree/master/zookeeper-specifications)
 for Zab & ZooKeeper.
   
   Based on ZooKeeper's [existing 
specification](https://github.com/apache/zookeeper/blob/master/zookeeper-specifications/system-spec/zk-3.7/ZkV3_7_0.tla),
 we refine the actions according to the code implementation to explore the 
possible interleaving of multi-node and multi-threading events with failures. 
Then we check the specification to see whether it violates some invariants 
(derived from Zab paper and code) with the model checker. 
   
   For example, when we specify Learner.syncWithLeader(..) in version 3.9.1, 
the model checking found the issue traces 
   of [ZOOKEEPER-4643](https://issues.apache.org/jira/browse/ZOOKEEPER-4643), 
[ZOOKEEPER-4646](https://issues.apache.org/jira/browse/ZOOKEEPER-4646), 
[ZOOKEEPER-4394](https://issues.apache.org/jira/browse/ZOOKEEPER-4394), 
[ZOOKEEPER-4685](https://issues.apache.org/jira/browse/ZOOKEEPER-4685) and 
[ZOOKEEPER-3023](https://issues.apache.org/jira/browse/ZOOKEEPER-3023).
   
   When specifying Learner.syncWithLeader(..) in version 3.9.2 (fixed by 
#2111), the issue traces of 
[ZOOKEEPER-4394](https://issues.apache.org/jira/browse/ZOOKEEPER-4394) and 
[ZOOKEEPER-3023](https://issues.apache.org/jira/browse/ZOOKEEPER-3023) can be 
detected. 
   
   More details of the specifications can be found in this 
[repo](https://github.com/AlphaCanisMajoris/zookeeper-tla-spec).
   
   > Can you share your tests?
   
   We verify this fix with the TLA+ specification 
[zk_pr_2152](https://github.com/AlphaCanisMajoris/zookeeper-tla-spec/blob/main/zk_pr_2152.tla).
  
   
   Some details in this specification: according to the fix in this PR, the 
logic when a follower receives NEWLEADER is specified into three actions:
   
   - FollowerProcessNEWLEADER_pr2152_1: Take snapshot if needed; persist and 
commit requests according to "packetsCommitted".
   - FollowerProcessNEWLEADER_pr2152_2: Update currentEpoch.
   - FollowerProcessNEWLEADER_pr2152_3: Reply NEWLEADER ack.
   
   The verification statistics are provided 
[here](https://github.com/AlphaCanisMajoris/zookeeper-tla-spec/blob/main/statistics/zk_pr_2152/results.csv).
 No violation is found during the checking with various configurations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to