[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Beom Heyn Kim updated ZOOKEEPER-2946:
-------------------------------------
    Summary: The truncate() function in FileTxnLog.java may fail to properly 
remove an uncommitted write  (was: The truncate() function in FileTxnLog.java 
may fail to properly remove the uncommitted write)

> The truncate() function in FileTxnLog.java may fail to properly remove an 
> uncommitted write
> -------------------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2946
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2946
>             Project: ZooKeeper
>          Issue Type: Bug
>    Affects Versions: 3.4.11, 3.4.12
>            Reporter: Beom Heyn Kim
>
> The truncate() function in FileTxnLog.java may fail to properly remove the 
> uncommitted write. It can happen when the follower that has the uncommitted 
> writes tries to resync with the leader after a few epochs went past. So, 
> inconsistency can occur in the in-memory data tree across nodes. Here is one 
> procedure to reproduce the inconsistency. 
> Initially:
> # Start the ensemble with three nodes: node 0, 1 and 2 (the node 2 is the 
> leader)
> # Create 5 znodes with initial values as follow (key = value)
> {noformat}
> /testDivergenceResync0 = 0
> /testDivergenceResync1 = 1
> /testDivergenceResync2 = 2
> /testDivergenceResync3 = 3
> /testDivergenceResync4 = 4
> {noformat}
> To Reproduce:
> # Diverge the node 2
> a. Shutdown the node 0 and 1
> b. Async setData to the node 2 writing 1000 to the key 
> ‘/testDivergenceResync0’
> c. Shutdown the node 2
> # Restart the node 0 and 1 (let them finish with resync)
> # Diverge the node 1
> a. Shutdown the node 0
> b. Async setData to the node 1 writing 1001 to the key 
> ‘/testDivergenceResync1’
> c. Shutdown the node 1
> # Restart the node 0, 1 and 2 (let them finish with resync)
> # Diverge the node 2
> a. Shutdown the node 0 and 1
> b. Async setData to the node 2 writing 1002 to the key 
> ‘/testDivergenceResync2’
> c. Shutdown the node 2
> # Restart the node 0 and 2 (let them finish with resync)
> # Diverge the node 2
> a. Shutdown the node 0
> b. Async setData to the node 2 writing 1003 to the key 
> ‘/testDivergenceResync3’
> c. Shutdown the node 2
> # Restart the node 0 and 1 (let them finish with resync)
> # Diverge the node 1
> a. Shutdown the node 0
> b. Async setData to the node 1 writing 1004 to the key 
> ‘/testDivergenceResync4’
> c. Shutdown the node 1
> # Restart the node 0 and 2 (let them finish with resync)
> # Restart the node 1 (let it finish with resync)
> Reading each key from each node directly will give us the output:
> {noformat}
> /testDivergenceResync0 on the node 0 = 0
> /testDivergenceResync0 on the node 1 = 0
> /testDivergenceResync0 on the node 2 = 0
> /testDivergenceResync1 on the node 0 = 1001
> /testDivergenceResync1 on the node 1 = 1001
> /testDivergenceResync1 on the node 2 = 1001
> /testDivergenceResync2 on the node 0 = 2
> /testDivergenceResync2 on the node 1 = 2
> /testDivergenceResync2 on the node 2 = 2
> /testDivergenceResync3 on the node 0 = 3
> /testDivergenceResync3 on the node 1 = 3
> /testDivergenceResync3 on the node 2 = 1003
> /testDivergenceResync4 on the node 0 = 1004
> /testDivergenceResync4 on the node 1 = 1004
> /testDivergenceResync4 on the node 2 = 1004
> {noformat}
> Thus, the value of key /testDivergenceResync3 is inconsistent across nodes.
> What seems to happen:
> # At the step 7, setData (at zxid 0x400000001) writing value 1003 is 
> committed on the node 2.
> {panel:title=Log from the node 2:}
> ...
> 2017-11-16 03:08:14,123 [myid:2] - DEBUG [ProcessThread(sid:2 
> cport:-1)::CommitProcessor@174] - Processing request:: 
> sessionid:0x2000117327c0000 type:setData cxid:0x4 zxid:0x400000001 txntype:5 
> reqpath:n/a
> 2017-11-16 03:08:14,124 [myid:2] - DEBUG [ProcessThread(sid:2 
> cport:-1)::Leader@787] - Proposing:: sessionid:0x2000117327c0000 type:setData 
> cxid:0x4 zxid:0x400000001 txntype:5 reqpath:n/a
> 2017-11-16 03:08:14,124 [myid:2] - INFO  [SyncThread:2:FileTxnLog@209] - 
> Creating new log file: log.400000001
> 2017-11-16 03:08:14,188 [myid:2] - DEBUG [SyncThread:2:Leader@600] - Count 
> for zxid: 0x400000001 is 1
> 2017-11-16 03:08:15,752 [myid:2] - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@512] - Shutting down
> 2017-11-16 03:08:15,753 [myid:2] - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Leader@518] - Shutdown called
> java.lang.Exception: shutdown Leader! reason: Not sufficient followers 
> synced, only synced with sids: [ 2 ]
> ...
> {panel}
> # At the step 10, the node 2 is restarted and supposed to be properly 
> resync’ed with the node 0 which is the leader.
>     a. The node 0 sends TRUNC msg so that the node 2 can truncate the setData 
> at zxid 0x400000001.. 
>     b. On the other hand, the node 2 tries to truncate log to get in sync 
> with the leader 0x200000001. However, the node 2 failed to properly truncate 
> the setData at zxid 0x400000001. So, even if resync was finished, the value 
> 1003 is still remained intact on the node 2 while other nodes have value 3 
> for the same key.
>     c. It seems on the node 2, there is only log.100000001 and log.400000001 
> but no log.200000001. This seems to cause failing to delete log.400000001 
> during truncate(). It looks like we will be considering log.400000001 by the 
> time returning from the init() of FileTxnLog.java so that we will never 
> execute ‘itr.logFile.delete()’ for the log.400000001. 
>    d. Then, after returning from the truncate(), loadDatabase() will be 
> invoked and log.400000001 will be read and the setData at zxid 0x400000001 
> gets loaded into the in-memory data tree.
> {panel:title=Log from the node 2:}
> ...
> 2017-11-16 03:08:59,051 [myid:2] - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Follower@65] - FOLLOWING - LEADER 
> ELECTION TOOK - 215
> 2017-11-16 03:08:59,052 [myid:2] - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:QuorumPeer$QuorumServer@184] - 
> Resolved hostname: 127.0.0.1 to address: /127.0.0.1
> 2017-11-16 03:08:59,125 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@349] - Truncating log to 
> get in sync with the leader 0x200000001
> 2017-11-16 03:08:59,125 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - 
> Created new input stream 
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
> 2017-11-16 03:08:59,125 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - 
> Created new input archive 
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
> 2017-11-16 03:08:59,126 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - 
> EOF excepton java.io.EOFException
> 2017-11-16 03:08:59,126 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - 
> Created new input stream 
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
> 2017-11-16 03:08:59,126 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - 
> Created new input archive 
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
> 2017-11-16 03:08:59,126 [myid:2] - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileSnap@83] - Reading snapshot 
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.200000001
> 2017-11-16 03:08:59,127 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - 
> Created new input stream 
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
> 2017-11-16 03:08:59,127 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - 
> Created new input archive 
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.100000001
> 2017-11-16 03:08:59,128 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - 
> EOF excepton java.io.EOFException
> 2017-11-16 03:08:59,128 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@606] - 
> Created new input stream 
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
> 2017-11-16 03:08:59,128 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@609] - 
> Created new input archive 
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/log.400000001
> 2017-11-16 03:08:59,128 [myid:2] - DEBUG 
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnLog$FileTxnIterator@647] - 
> EOF excepton java.io.EOFException
> 2017-11-16 03:08:59,131 [myid:2] - WARN  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:Learner@387] - Got zxid 0x500000001 
> expected 0x1
> 2017-11-16 03:08:59,132 [myid:2] - INFO  
> [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:11227:FileTxnSnapLog@248] - Snapshotting: 
> 0x500000004 to 
> /home/ben/project/strata/test-5-3-ZooKeeper-3.4.11-strata-0.1/data/2/version-2/snapshot.500000004
> ...
> {panel}
> {panel:title=Log from the node 0:}
> ...
> 2017-11-16 03:08:59,050 [myid:0] - INFO  
> [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@372] - LEADING - LEADER 
> ELECTION TOOK - 222
> 2017-11-16 03:08:59,055 [myid:0] - INFO  
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@346] - Follower sid: 2 : info 
> : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@266e422
> 2017-11-16 03:08:59,124 [myid:0] - INFO  
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@401] - Synchronizing with 
> Follower sid: 2 maxCommittedLog=0x500000004 minCommittedLog=0x100000001 
> peerLastZxid=0x400000001
> 2017-11-16 03:08:59,124 [myid:0] - DEBUG 
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@415] - proposal size is 14
> 2017-11-16 03:08:59,124 [myid:0] - DEBUG 
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@418] - Sending proposals to 
> follower
> 2017-11-16 03:08:59,124 [myid:0] - INFO  
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@475] - Sending TRUNC
> 2017-11-16 03:08:59,147 [myid:0] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxnFactory@215] - 
> Accepted socket connection from /127.0.0.1:55118
> 2017-11-16 03:08:59,184 [myid:0] - WARN  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@383] - Exception 
> causing close of session 0x0: ZooKeeperServer not running
> 2017-11-16 03:08:59,184 [myid:0] - DEBUG 
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@386] - IOException 
> stack trace
> java.io.IOException: ZooKeeperServer not running
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.readLength(NIOServerCnxn.java:977)
>         at 
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:257)
>         at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:226)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-11-16 03:08:59,184 [myid:0] - INFO  
> [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:11221:NIOServerCnxn@1040] - Closed 
> socket connection for client /127.0.0.1:55118 (no session established for 
> client)
> 2017-11-16 03:08:59,224 [myid:0] - INFO  
> [LearnerHandler-/127.0.0.1:54482:LearnerHandler@535] - Received NEWLEADER-ACK 
> message from 2
> 2017-11-16 03:08:59,224 [myid:0] - INFO  
> [QuorumPeer[myid=0]/0:0:0:0:0:0:0:0:11221:Leader@962] - Have quorum of 
> supporters, sids: [ 0,2 ]; starting up and setting last processed zxid: 
> 0x600000000
> ...
> {panel}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to