[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-3607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972105#comment-16972105
 ] 

Jiafu Jiang commented on ZOOKEEPER-3607:
----------------------------------------

I can reproduce this problem in the following steps:
1. Clear all data of zk1, zk2 and zk3, including all the data under the data 
dir and log dir.
2. Start zk1 and zk3, zk3 will become the leader, because zk3 has a larger ID.
3. Start zk2.
4. Create "/a". Now "/a" will exist in zk1, zk2, and zk3.
5. Stop zk2: bin/zkServer.sh stop.
6. Create "/b". Now "/a" and "/b" will exist in both zk1 and zk3.
7. Stop zk1 and zk3, and delete all data of zk1.
8. Start zk1 and zk2. Because zk2 has more recent data, so zk2 will become the 
leader. Now zk1 and zk2 both have data "/a".
9. Create "/c". Now zk1 and zk2 both have data "/a" and "/c".
9. Backup all the data of zk3 to /tmp. 
10.Start zk3. Zk3 will synchronize data with the leader(zk2), so "/b" will be 
truncated, and zk3 will receive "/c" from zk2. And zk3 will apply "/c" to the 
datatree, but not ZKDatabase.committedLog.
11.Stop zk2(the leader). Because zk1 and zk3 have the same max zxid, so zk3 
will become the leader since it has a larger ID. 
12. Repalce zk2's data using the backup that we save in step 9. Remember to 
change the myid to 2.
13. Start zk2. Zk2 will become a follower, and try to synchronize with the 
leader(zk3). Zk3 will send a TRUNC to zk2 to truncate the "/b", but zk3 will 
not send "/c" to zk2, because "/c" does not exist in zk3's 
ZKDatabase.committedLog.
14. Now if you use "bin/zkCli.sh ls /" on the node of zk2, you can not see 
"/c", but you can see it from both zk1 and zk3.

> Potential data inconsistency due to the inconsistency between 
> ZKDatabase.committedLog and dataTree in Trunc sync.
> -----------------------------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-3607
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3607
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.14
>            Reporter: Jiafu Jiang
>            Priority: Critical
>
> I will describe the problem by a detail example.
> 1. Suppose we have three zk servers: zk1, zk2, and zk3. zk1 and zk2 are 
> online, zk3 is offline, zk1 is the leader.
> 2. In TRUNC sync, zk1 sends a TRUNC request to zk2, then sends the remaining 
> proposals in the committedLog. *When the follower zk2 receives the proposals, 
> it applies them directly into the datatree, but not the committedLog.*
> 3. After the data sync phase, zk1 may continue to send zk2 more committed 
> proposals, and they will be applied to both the datatree and the committedLog 
> of zk2.
> 4. Then zk1 fails, zk3 restarts successfully, zk2 becomes the leader.
> 5. The leader zk2 sends a TRUNC request to zk3, then the remaining proposals 
> from the committedLog. But since some proposals, which are from the leader 
> zk1 in TRUNC sync(as I describe above), are not in the committedLog, they 
> will not be sent to zk3.
> 6. Now data inconsistency happens between zk2 and zk3, since some data may 
> exist in zk2's datatree, but not zk3's datatree.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to