[
https://issues.apache.org/jira/browse/ZOOKEEPER-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13141235#comment-13141235
]
Camille Fournier commented on ZOOKEEPER-1264:
---------------------------------------------
>From a comment I added to the tracker that this change was attached to:
ZOOKEEPER-1136 causes a concurrency bug. Specifically:
1. Follower rejoins, gets snap from leader
2. Follower gets NEWLEADER message and takes a snapshot
3. Follower gets some additional tranactions forwarded from leader, applies
these directly to data tree
4. Follower gets an UPTODATE message, does not take a snapshot
5. Follower starts following, writes some new transactions to its log, and is
killed before it takes another snapshot
6. Follower restarts and gets a DIFF from the leader
The transactions that came in between NEWLEADER and UPTODATE are lost because
they never go anywhere but the internal data tree, and if that tree isn't
snapshotted and the follower restarts with only a DIFF, the follower will lose
these transactions.
I think the proper thing to do is snapshot after UPTODATE, but I'm not sure why
we changed this to snapshot after NEWLEADER instead. The wiki doesn't seem to
explain that clearly.
> FollowerResyncConcurrencyTest failing intermittently
> ----------------------------------------------------
>
> Key: ZOOKEEPER-1264
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1264
> Project: ZooKeeper
> Issue Type: Bug
> Components: tests
> Affects Versions: 3.3.3, 3.4.0, 3.5.0
> Reporter: Patrick Hunt
> Assignee: Camille Fournier
> Priority: Blocker
> Fix For: 3.3.4, 3.4.0, 3.5.0
>
> Attachments: ZOOKEEPER-1264.patch, ZOOKEEPER-1264_branch33.patch,
> ZOOKEEPER-1264_branch34.patch, followerresyncfailure_log.txt.gz, logs.zip,
> tmp.zip
>
>
> The FollowerResyncConcurrencyTest test is failing intermittently.
> saw the following on 3.4:
> {noformat}
> junit.framework.AssertionFailedError: Should have same number of
> ephemerals in both followers expected:<11741> but was:<14001>
> at
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.verifyState(FollowerResyncConcurrencyTest.java:400)
> at
> org.apache.zookeeper.test.FollowerResyncConcurrencyTest.testResyncBySnapThenDiffAfterFollowerCrashes(FollowerResyncConcurrencyTest.java:196)
> at
> org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
> {noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira