[
https://issues.apache.org/jira/browse/ZOOKEEPER-2845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16367962#comment-16367962
]
ASF GitHub Bot commented on ZOOKEEPER-2845:
-------------------------------------------
Github user afine commented on a diff in the pull request:
https://github.com/apache/zookeeper/pull/453#discussion_r168886064
--- Diff:
src/java/test/org/apache/zookeeper/server/quorum/QuorumPeerMainTest.java ---
@@ -888,4 +923,103 @@ public void testWithOnlyMinSessionTimeout() throws
Exception {
maxSessionTimeOut, quorumPeer.getMaxSessionTimeout());
}
+ @Test
+ public void testFailedTxnAsPartOfQuorumLoss() throws Exception {
+ // 1. start up server and wait for leader election to finish
+ ClientBase.setupTestEnv();
+ final int SERVER_COUNT = 3;
+ servers = LaunchServers(SERVER_COUNT);
+
+ waitForAll(servers, States.CONNECTED);
+
+ // we need to shutdown and start back up to make sure that the
create session isn't the first transaction since
+ // that is rather innocuous.
+ servers.shutDownAllServers();
+ waitForAll(servers, States.CONNECTING);
+ servers.restartAllServersAndClients(this);
+ waitForAll(servers, States.CONNECTED);
+
+ // 2. kill all followers
+ int leader = servers.findLeader();
+ Map<Long, Proposal> outstanding =
servers.mt[leader].main.quorumPeer.leader.outstandingProposals;
+ // increase the tick time to delay the leader going to looking
+ servers.mt[leader].main.quorumPeer.tickTime = 10000;
+ LOG.warn("LEADER {}", leader);
+
+ for (int i = 0; i < SERVER_COUNT; i++) {
+ if (i != leader) {
+ servers.mt[i].shutdown();
+ }
+ }
+
+ // 3. start up the followers to form a new quorum
+ for (int i = 0; i < SERVER_COUNT; i++) {
+ if (i != leader) {
+ servers.mt[i].start();
+ }
+ }
+
+ // 4. wait one of the follower to be the new leader
+ for (int i = 0; i < SERVER_COUNT; i++) {
+ if (i != leader) {
+ // Recreate a client session since the previous session
was not persisted.
+ servers.restartClient(i, this);
+ waitForOne(servers.zk[i], States.CONNECTED);
+ }
+ }
+
+ // 5. send a create request to old leader and make sure it's
synced to disk,
+ // which means it acked from itself
+ try {
+ servers.zk[leader].create("/zk" + leader, "zk".getBytes(),
Ids.OPEN_ACL_UNSAFE,
+ CreateMode.PERSISTENT);
+ Assert.fail("create /zk" + leader + " should have failed");
+ } catch (KeeperException e) {
+ }
+
+ // just make sure that we actually did get it in process at the
+ // leader
+ Assert.assertEquals(1, outstanding.size());
+ Proposal p = outstanding.values().iterator().next();
+ Assert.assertEquals(OpCode.create, p.request.getHdr().getType());
+
+ // make sure it has a chance to write it to disk
+ int sleepTime = 0;
+ Long longLeader = new Long(leader);
+ while (!p.qvAcksetPairs.get(0).getAckset().contains(longLeader)) {
+ if (sleepTime > 2000) {
+ Assert.fail("Transaction not synced to disk within 1
second " + p.qvAcksetPairs.get(0).getAckset()
+ + " expected " + leader);
+ }
+ Thread.sleep(100);
+ sleepTime += 100;
+ }
+
+ // 6. wait for the leader to quit due to not enough followers and
come back up as a part of the new quorum
+ sleepTime = 0;
+ Follower f = servers.mt[leader].main.quorumPeer.follower;
+ while (f == null || !f.isRunning()) {
+ if (sleepTime > 10_000) {
--- End diff --
nitpick: can we reuse the ticktime here to make the relationship more
obvious?
> Data inconsistency issue due to retain database in leader election
> ------------------------------------------------------------------
>
> Key: ZOOKEEPER-2845
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2845
> Project: ZooKeeper
> Issue Type: Bug
> Components: quorum
> Affects Versions: 3.4.10, 3.5.3, 3.6.0
> Reporter: Fangmin Lv
> Assignee: Robert Joseph Evans
> Priority: Critical
>
> In ZOOKEEPER-2678, the ZKDatabase is retained to reduce the unavailable time
> during leader election. In ZooKeeper ensemble, it's possible that the
> snapshot is ahead of txn file (due to slow disk on the server, etc), or the
> txn file is ahead of snapshot due to no commit message being received yet.
> If snapshot is ahead of txn file, since the SyncRequestProcessor queue will
> be drained during shutdown, the snapshot and txn file will keep consistent
> before leader election happening, so this is not an issue.
> But if txn is ahead of snapshot, it's possible that the ensemble will have
> data inconsistent issue, here is the simplified scenario to show the issue:
> Let's say we have a 3 servers in the ensemble, server A and B are followers,
> and C is leader, and all the snapshot and txn are up to T0:
> 1. A new request reached to leader C to create Node N, and it's converted to
> txn T1
> 2. Txn T1 was synced to disk in C, but just before the proposal reaching out
> to the followers, A and B restarted, so the T1 didn't exist in A and B
> 3. A and B formed a new quorum after restart, let's say B is the leader
> 4. C changed to looking state due to no enough followers, it will sync with
> leader B with last Zxid T0, which will have an empty diff sync
> 5. Before C take snapshot it restarted, it replayed the txns on disk which
> includes T1, now it will have Node N, but A and B doesn't have it.
> Also I included the a test case to reproduce this issue consistently.
> We have a totally different RetainDB version which will avoid this issue by
> doing consensus between snapshot and txn files before leader election, will
> submit for review.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)