[ https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13454682#comment-13454682 ]
Jacky007 commented on ZOOKEEPER-1549: ------------------------------------- Thanks for Falvio to comment. I agree with Falvio, deleting a snapshot is too dangerous to be an optional. There should be another solution. Step 1 74 is in A's transaction logs. Step 2 A is the new leader, and it will execute the following code. {noformat} void lead() throws IOException, InterruptedException { self.end_fle = System.currentTimeMillis(); LOG.info("LEADING - LEADER ELECTION TOOK - " + (self.end_fle - self.start_fle)); self.start_fle = 0; self.end_fle = 0; zk.registerJMX(new LeaderBean(this, zk), self.jmxLocalPeerBean); try { self.tick = 0; zk.loadData(); {noformat} Then A will load its snapshot and committedlog. {noformat} public void loadData() throws IOException, InterruptedException { setZxid(zkDb.loadDataBase()); // Clean up dead sessions LinkedList<Long> deadSessions = new LinkedList<Long>(); for (Long session : zkDb.getSessions()) { if (zkDb.getSessionWithTimeOuts().get(session) == null) { deadSessions.add(session); } } zkDb.setDataTreeInit(true); for (long session : deadSessions) { // XXX: Is lastProcessedZxid really the best thing to use? killSession(session, zkDb.getDataTreeLastProcessedZxid()); } // Make a clean snapshot takeSnapshot(); } {noformat} when A takeSnapshot(), 74 is in it(if A dies after that, B will never know it). When A load database, {noformat} public void loadData() throws IOException, InterruptedException { setZxid(zkDb.loadDataBase()); {noformat} it will restore database from snapshots and transaction logs, {noformat} long zxid = snapLog.restore(dataTree,sessionsWithTimeouts,listener); {noformat} {noformat} try { processTransaction(hdr,dt,sessions, itr.getTxn()); } catch(KeeperException.NoNodeException e) { throw new IOException("Failed to process transaction type: " + hdr.getType() + " error: " + e.getMessage(), e); } listener.onTxnLoaded(hdr, itr.getTxn()); {noformat} but 74 is in A's transaction logs. > Data inconsistency when follower is receiving a DIFF with a dirty snapshot > -------------------------------------------------------------------------- > > Key: ZOOKEEPER-1549 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549 > Project: ZooKeeper > Issue Type: Bug > Components: quorum > Affects Versions: 3.4.3 > Reporter: Jacky007 > Priority: Critical > > the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is > not correct. > here is scenario(similar to 1154): > Initial Condition > 1. Lets say there are three nodes in the ensemble A,B,C with A being the > leader > 2. The current epoch is 7. > 3. For simplicity of the example, lets say zxid is a two digit number, > with epoch being the first digit. > 4. The zxid is 73 > 5. All the nodes have seen the change 73 and have persistently logged it. > Step 1 > Request with zxid 74 is issued. The leader A writes it to the log but there > is a crash of the entire ensemble and B,C never write the change 74 to their > log. > Step 2 > A,B restart, A is elected as the new leader, and A will load data and take a > clean snapshot(change 74 is in it), then send diff to B, but B died before > sync with A. A died later. > Step 3 > B,C restart, A is still down > B,C form the quorum > B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73 > epoch is now 8, zxid is 80 > Request with zxid 81 is successful. On B, minCommitLog is now 71, > maxCommitLog is 81 > Step 4 > A starts up. It applies the change in request with zxid 74 to its in-memory > data tree > A contacts B to registerAsFollower and provides 74 as its ZxId > Since 71<=74<=81, B decides to send A the diff. > Problem: > The problem with the above sequence is that after truncate the log, A will > load the snapshot again which is not correct. > In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), > the leader will send a snapshot to follower, it will not be a problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira