[ https://issues.apache.org/jira/browse/ZOOKEEPER-4681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
krystal he updated ZOOKEEPER-4681: ---------------------------------- Attachment: zookeeper-no-message-loss.patch > Uncommitted requests have been executed > ---------------------------------------- > > Key: ZOOKEEPER-4681 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-4681 > Project: ZooKeeper > Issue Type: Bug > Components: quorum > Affects Versions: 3.5.8 > Reporter: krystal he > Priority: Critical > Attachments: zookeeper-no-message-loss.patch, > zookeeper-scenario1.patch, zookeeper-scenario10.patch, > zookeeper-scenario11.patch, zookeeper-scenario12.patch, > zookeeper-scenario2.patch, zookeeper-scenario3.patch, > zookeeper-scenario4.patch, zookeeper-scenario5.patch, > zookeeper-scenario6.patch, zookeeper-scenario7.patch, > zookeeper-scenario8.patch, zookeeper-scenario9.patch > > > Using a [tool|https://github.com/kry4tall/CC-ZOO358] that I modifyed from > [Filip Niksic's zootester|https://github.com/fniksic/zootester] for testing > ZooKeeper, I discovered the following scenario which causes uncommitted > requests to be executed. > Zab protocol has three rounds: PROPOSE, ACK, and COMMIT. By adding relevant > code to the zookeeper source code,my tool can drop the PROPOSAL, ACK and > COMMIT messages and collect the values of some variables of each server > instance at the end of each round. Except affecting message reception, my > code will not affect other actions of Zookeeper. > > Setup: > ubuntu 22.04.2, maven 3.9.0, ant 1.10.13. > Replace directory called "zookeeper-server" in Zookeeper 3.5.8 with the > "zookeeper-server" in [my github repo|https://github.com/kry4tall/CC-ZOO358]. > Ant the modified Zookeeper 3.5.8 to get zookeeper-3.5.8.jar. Replace > zookeeper-3.5.8.jar downloaded by maven. > Create a directory called "states" and a file called > "[scenarios|https://github.com/kry4tall/CC-ZOO358/blob/krystal/zoo-tester/test/scenarios]". > Write the path to test.properties in zoo-tester's resource directory. > Use "-s scenario-X"(X = 1,2,3,4,5,6) as the startup parameter to run the main > method of ZooTester. > > Base scenario: > Initially, start an ensemble with 3 servers called A, B, and C, and > initialize 2 znodes called /key0 and /key1, and set them to 0 and 1 > respectively. > # Request to set /key0 to 1000 on 3 servers. > # *(Optional) Isolate the proposal messages which leader send to 2 > followers.* > # *(Optional) Isolate the ack messages which 2 followers send to leader.* > # (Optional) Stop all servers and then restart them. > # (Optional) Read /key0 and /key1 in all servers respectively. > # Request to set /key1 to 1001 on 3 servers. > # (Optional) Stop all servers and then restart them. > # Read /key0 and /key1 in all servers respectively. > Mark the execution step list [1,2,5,6,8] as {*}scenario1{*}, [1,2,4,5,6,8] as > {*}scenario2{*}, [1,2,5,6,7,8] as {*}scenario3{*}, [1,2,4,5,6,7,8] as > {*}scenario4{*}, [1,2,6,8] as *scenario5* and [1,2,6,7,8] as {*}scenario6{*}, > [1,3,5,6,8] as {*}scenario7{*}, [1,3,4,5,6,8] as {*}scenario8{*}, > [1,3,5,6,7,8] as {*}scenario9{*}, [1,3,4,5,6,7,8] as {*}scenario10{*}, > [1,3,6,8] as *scenario11* and [1,3,6,7,8] as {*}scenario12{*}. > The output of these 12 scenarios is placed in the attachment. As a > comparison, I have also attached the results of scenario [1,6,8] where {*}no > message loss action was performed{*}. We can see that the results have no > problems. > The typical case of a bug caused by dropping proposal message is scenario6. > In the optional steps, scenario6 selects step2 and step7. By performing these > two operations, we finally obtained the following result which violates data > consistency: @ 0: /key0 -> 0, /key1 -> 1001; @ 1: /key0 -> 0, /key1 -> 1001; > @ 2: /key0 -> 1000, /key1 -> 1001. > The typical case of a bug caused by dropping ack message is scenario7. In the > optional steps, scenario7 selects step3 and step5. In this scenario, we > obtained the following result which violates data consistency after step5: @ > 0: /key0 -> 0, /key1 -> 1; @ 1: /key0 -> 1000, /key1 -> 1; @ 2: /key0 -> > 1000, /key1 -> 1. > In addition, by comparing scenario2 and scenario3, we can find that > restarting the cluster will affect the results. By comparing scenario1 and > scenario5, we can find that step5, the operation of reading the content of > the znode, also affects the results. -- This message was sent by Atlassian Jira (v8.20.10#820010)