[
https://issues.apache.org/jira/browse/ZOOKEEPER-1549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15306472#comment-15306472
]
Ed Rowe commented on ZOOKEEPER-1549:
------------------------------------
[~fpj] I think the approach of deleting snapshots is a good one. Further, I
think you hit the nail on the head when you say "Snapshots are simply compacted
versions of the txn log history." The previous idea (further up the comment
history) in which the log, but not snapshots, would be allowed to contain
uncommitted transactions seems fragile. A pedantic update to your definition
would be "Snapshots are simply compacted versions of the txn log history, as
applied to the DataTree." to recognize that snapshots only ever contain
information that has been in a DataTree (though not necessarily a DataTree that
was ever visible outside a given Node) while logs can contain information that
has never been in a DataTree.
One issue to account for in the fix is the case where there is no earlier
snapshot to rebuild from. This could occur if an operator has deleted older
snapshots. I think we'd still want to delete the snapshot being truncated and
arrange for the learner node to start over with a blank database.
Another consideration is that, as I've written in ZOOKEEPER-2436, the learner
might receive from the leader a SNAP rather than a TRUNC. In this situation,
the snapshot on the learner node that a TRUNC would have deleted will still be
present on the learner node, but it will no longer be the newest snapshot. I
don't think this will cause any problems but I did want to bring it up.
Finally, there were some concerns raised early in the comment thread that
deleting snapshots might be too aggressive. If this is still an issue, we could
(perhaps optionally) rename logs and snapshots that we are truncating rather
than delete them. The snapshot/log purge code might then clean these up if they
are old enough. I'm not convinced that this is worth implementing now.
> Data inconsistency when follower is receiving a DIFF with a dirty snapshot
> --------------------------------------------------------------------------
>
> Key: ZOOKEEPER-1549
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1549
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.4.3
> Reporter: Jacky007
> Assignee: Flavio Junqueira
> Priority: Blocker
> Fix For: 3.5.2, 3.6.0
>
> Attachments: ZOOKEEPER-1549-3.4.patch, ZOOKEEPER-1549-learner.patch,
> case.patch
>
>
> the trunc code (from ZOOKEEPER-1154?) cannot work correct if the snapshot is
> not correct.
> here is scenario(similar to 1154):
> Initial Condition
> 1. Lets say there are three nodes in the ensemble A,B,C with A being the
> leader
> 2. The current epoch is 7.
> 3. For simplicity of the example, lets say zxid is a two digit number,
> with epoch being the first digit.
> 4. The zxid is 73
> 5. All the nodes have seen the change 73 and have persistently logged it.
> Step 1
> Request with zxid 74 is issued. The leader A writes it to the log but there
> is a crash of the entire ensemble and B,C never write the change 74 to their
> log.
> Step 2
> A,B restart, A is elected as the new leader, and A will load data and take a
> clean snapshot(change 74 is in it), then send diff to B, but B died before
> sync with A. A died later.
> Step 3
> B,C restart, A is still down
> B,C form the quorum
> B is the new leader. Lets say B minCommitLog is 71 and maxCommitLog is 73
> epoch is now 8, zxid is 80
> Request with zxid 81 is successful. On B, minCommitLog is now 71,
> maxCommitLog is 81
> Step 4
> A starts up. It applies the change in request with zxid 74 to its in-memory
> data tree
> A contacts B to registerAsFollower and provides 74 as its ZxId
> Since 71<=74<=81, B decides to send A the diff.
> Problem:
> The problem with the above sequence is that after truncate the log, A will
> load the snapshot again which is not correct.
> In 3.3 branch, FileTxnSnapLog.restore does not call listener(ZOOKEEPER-874),
> the leader will send a snapshot to follower, it will not be a problem.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)