[ https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13540043#comment-13540043 ]
Flavio Junqueira commented on ZOOKEEPER-1549: --------------------------------------------- [~thawan] Ok, I got a better clue of what you're referring to with the syncLimit comment, but I'm not there yet. syncLimit has always been a parameter that limits the amount of time a follower can take to catch up, so I'm not proposing any change to the semantics of syncLimit, just to make it clear. About having it for 3.5.0, I suggest we make it a blocker for 3.5.0. If necessary, I also suggest we delay the release to have it in, although certainly not ideal. Given that we will be creating a new branch (3.5), I suppose that we don't need to have some of the backward-compatibility stuff that we currently have in the code to make sure that 3.3 servers talk to 3.4. servers, yes? Perhaps this is a question for [~phunt]. It would be awesome if you could work on the leader part. I think the only tricky part on the follower side is making sure that everything is persisted at the right time, but I don't think that we will need major code changes. If you can work on both, I'm happy to be the reviewer of your code, and otherwise I can work on the follower part and we review each other's patches. > Data inconsistency when follower is receiving a DIFF with a dirty snapshot > -------------------------------------------------------------------------- > > Key: ZOOKEEPER-1549 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549 > Project: ZooKeeper > Issue Type: Bug > Components: quorum > Affects Versions: 3.4.3 > Reporter: Jacky007 > Priority: Blocker > Attachments: case.patch > > > the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is > not correct. > here is scenario(similar to 1154): > Initial Condition > 1. Lets say there are three nodes in the ensemble A,B,C with A being the > leader > 2. The current epoch is 7. > 3. For simplicity of the example, lets say zxid is a two digit number, > with epoch being the first digit. > 4. The zxid is 73 > 5. All the nodes have seen the change 73 and have persistently logged it. > Step 1 > Request with zxid 74 is issued. The leader A writes it to the log but there > is a crash of the entire ensemble and B,C never write the change 74 to their > log. > Step 2 > A,B restart, A is elected as the new leader, and A will load data and take a > clean snapshot(change 74 is in it), then send diff to B, but B died before > sync with A. A died later. > Step 3 > B,C restart, A is still down > B,C form the quorum > B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73 > epoch is now 8, zxid is 80 > Request with zxid 81 is successful. On B, minCommitLog is now 71, > maxCommitLog is 81 > Step 4 > A starts up. It applies the change in request with zxid 74 to its in-memory > data tree > A contacts B to registerAsFollower and provides 74 as its ZxId > Since 71<=74<=81, B decides to send A the diff. > Problem: > The problem with the above sequence is that after truncate the log, A will > load the snapshot again which is not correct. > In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), > the leader will send a snapshot to follower, it will not be a problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira