[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554525#comment-13554525
 ] 

Thawan Kooburat commented on ZOOKEEPER-1549:
--------------------------------------------

Flavio: The patch on the learner side looks good at the moment. I will have 
more update as I work on the leader side.  

Here is the current implementation plan and thoughts

1. The server now initialize its DB by loading from the snapshot only. In order 
to get lastLastLoggedZxid(), I have walk through the entire txnlog, not just 
the last file, so we don't reintroduce bug found in ZOOKEEPER-596

2. When the leader start, it will have empty committedLog (since txnlog is not 
replayed). However, it will walk through txnlog directly using the same logic 
it used to walk committedLog. 

3. When the leader have sufficient vote for the NEWLEADER, it apply all the 
txns to the DataTree and also add to the commitedLog.  So any follower that try 
to synchronize after the quorum is form can use commitedLog for synchronization.

4. On the learner side, the learner can apply the transactions from txnlog and 
what it gets from the leader after it receives UPTODATE. Potentially, this can 
be up to 100K txns with the default configuration. I am worry about the time it 
takes to do perform this, since the leader already switch to use syncLimit at 
this point and may tear down the quorum. We can add extra stage for this phase 
by using the unused UPTODATE's ack but it is not part of the protocol.

5. There is a logic that kill dead sessions in the current 
ZooKeeeperServer.loadData(). I am not entirely sure why we need that clean up 
logic. 

    
                
> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1549
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.3
>            Reporter: Jacky007
>            Assignee: Thawan Kooburat
>            Priority: Blocker
>             Fix For: 3.4.6
>
>         Attachments: case.patch, ZOOKEEPER-1549-learner.patch
>
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
> not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1.    Lets say there are three nodes in the ensemble A,B,C with A being the 
> leader
> 2.    The current epoch is 7. 
> 3.    For simplicity of the example, lets say zxid is a two digit number, 
> with epoch being the first digit.
> 4.    The zxid is 73
> 5.    All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there 
> is a crash of the entire ensemble and B,C never write the change 74 to their 
> log.
> Step 2
> A,B restart, A is elected as the new leader,  and A will load data and take a 
> clean snapshot(change 74 is in it), then send diff to B, but B died before 
> sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71, 
> maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory 
> data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff. 
> Problem:
> The problem with the above sequence is that after truncate the log, A will 
> load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), 
> the leader will send a snapshot to follower, it will not be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to