[ 
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306472#comment-15306472
 ] 

Ed Rowe commented on ZOOKEEPER-1549:
------------------------------------

[~fpj] I think the approach of deleting snapshots is a good one. Further, I 
think you hit the nail on the head when you say "Snapshots are simply compacted 
versions of the txn log history." The previous idea (further up the comment 
history) in which the log, but not snapshots, would be allowed to contain 
uncommitted transactions seems fragile. A pedantic update to your definition 
would be "Snapshots are simply compacted versions of the txn log history, as 
applied to the DataTree." to recognize that snapshots only ever contain 
information that has been in a DataTree (though not necessarily a DataTree that 
was ever visible outside a given Node) while logs can contain information that 
has never been in a DataTree.

One issue to account for in the fix is the case where there is no earlier 
snapshot to rebuild from. This could occur if an operator has deleted older 
snapshots. I think we'd still want to delete the snapshot being truncated and 
arrange for the learner node to start over with a blank database.

Another consideration is that, as I've written in ZOOKEEPER-2436, the learner 
might receive from the leader a SNAP rather than a TRUNC. In this situation, 
the snapshot on the learner node that a TRUNC would have deleted will still be 
present on the learner node, but it will no longer be the newest snapshot. I 
don't think this will cause any problems but I did want to bring it up.

Finally, there were some concerns raised early in the comment thread that 
deleting snapshots might be too aggressive. If this is still an issue, we could 
(perhaps optionally) rename logs and snapshots that we are truncating rather 
than delete them. The snapshot/log purge code might then clean these up if they 
are old enough. I'm not convinced that this is worth implementing now.


> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-1549
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.3
>            Reporter: Jacky007
>            Assignee: Flavio Junqueira
>            Priority: Blocker
>             Fix For: 3.5.2, 3.6.0
>
>         Attachments: ZOOKEEPER-1549-3.4.patch, ZOOKEEPER-1549-learner.patch, 
> case.patch
>
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is 
> not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1.    Lets say there are three nodes in the ensemble A,B,C with A being the 
> leader
> 2.    The current epoch is 7. 
> 3.    For simplicity of the example, lets say zxid is a two digit number, 
> with epoch being the first digit.
> 4.    The zxid is 73
> 5.    All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there 
> is a crash of the entire ensemble and B,C never write the change 74 to their 
> log.
> Step 2
> A,B restart, A is elected as the new leader,  and A will load data and take a 
> clean snapshot(change 74 is in it), then send diff to B, but B died before 
> sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71, 
> maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory 
> data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff. 
> Problem:
> The problem with the above sequence is that after truncate the log, A will 
> load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874), 
> the leader will send a snapshot to follower, it will not be a problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to