[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user revans2 closed the pull request at: https://github.com/apache/zookeeper/pull/157 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r99347551 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, OutputArchive oa, Assert.assertEquals(1, f.self.getAcceptedEpoch()); Assert.assertEquals(1, f.self.getCurrentEpoch()); +//Wait for the transactions to be written out. The thread that writes them out +// does not send anything back when it is done. +long start = System.currentTimeMillis(); +while (createSessionZxid != f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) { --- End diff -- OK Will see what I can do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r99242422 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, OutputArchive oa, Assert.assertEquals(1, f.self.getAcceptedEpoch()); Assert.assertEquals(1, f.self.getCurrentEpoch()); +//Wait for the transactions to be written out. The thread that writes them out +// does not send anything back when it is done. +long start = System.currentTimeMillis(); +while (createSessionZxid != f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) { --- End diff -- it would be nice, a good way of actually validating everything is behaving as expected --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r98744573 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, OutputArchive oa, Assert.assertEquals(1, f.self.getAcceptedEpoch()); Assert.assertEquals(1, f.self.getCurrentEpoch()); +//Wait for the transactions to be written out. The thread that writes them out +// does not send anything back when it is done. +long start = System.currentTimeMillis(); +while (createSessionZxid != f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) { --- End diff -- It does seem a bit beyond the scope of this. But if you really want me to I can look into it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user hanm commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r98734336 --- Diff: src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java --- @@ -507,9 +507,12 @@ public synchronized void shutdown() { if (firstProcessor != null) { firstProcessor.shutdown(); } -if (zkDb != null) { -zkDb.clear(); -} + +// There is no need to clear the database +// * When a new quorum is established we can still apply the diff +//on top of the same zkDb data +// * If we fetch a new snapshot from leader, the zkDb will be +//cleared anyway before loading the snapshot --- End diff -- There is one case we may still want to clear db here - when one of the ZooKeeper critical threads (such as * processors, session trackers) fail, ZooKeeper server will shutdown (see runFromConfig) and consequently invoke ZooKeeper#shutdown. In this case, I don't see a particular reason not to clear the db, though not doing it might be fine (as one could argue the server will be dead anyway), but I tend to lean towards the safe side on cleaning the db. One way to conditionally do that is to add a Boolean parameter to ZooKeeper#shutdown so we can have fine grained control over when to clear db in what code path. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r98574493 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -839,12 +839,25 @@ public void converseWithFollower(InputArchive ia, OutputArchive oa, Assert.assertEquals(1, f.self.getAcceptedEpoch()); Assert.assertEquals(1, f.self.getCurrentEpoch()); +//Wait for the transactions to be written out. The thread that writes them out +// does not send anything back when it is done. +long start = System.currentTimeMillis(); +while (createSessionZxid != f.fzk.getLastProcessedZxid() && (System.currentTimeMillis() - start) < 50) { --- End diff -- just an idea, not sure if it is worth the effort and it may be outside the scope of this patch. we could play with the test infrastructure here a little bit and do some dependency injection in `createFollower` that can let us track if db clearing and snapshotting occurs when expected. this may help prevent hard to track race conditions with tests like this in the future. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user afine commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r98574533 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java --- @@ -321,13 +321,16 @@ protected void syncWithLeader(long newLeaderZxid) throws IOException, Interrupte QuorumPacket ack = new QuorumPacket(Leader.ACK, 0, null, null); QuorumPacket qp = new QuorumPacket(); long newEpoch = ZxidUtils.getEpochFromZxid(newLeaderZxid); - -readPacket(qp); +//In the DIFF case we don't need to do a snapshot because the transactions will sync on top of any existing snapshot --- End diff -- nit: I think we generally put spaces after the // --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user revans2 commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r98449817 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java --- @@ -364,10 +367,12 @@ else if (qp.getType() == Leader.SNAP) { long lastQueued = 0; -// in V1.0 we take a snapshot when we get the NEWLEADER message, but in pre V1.0 +// in Zab V1.0 (ZK 3.4+) we might take a snapshot when we get the NEWLEADER message, but in pre V1.0 // we take the snapshot at the UPDATE, since V1.0 also gets the UPDATE (after the NEWLEADER) // we need to make sure that we don't take the snapshot twice. -boolean snapshotTaken = false; +boolean isPreZAB1_0 = true; +//If we are not going to take the snapshot be sure the edits are not applied in memory +boolean writeToEditLog = !snapshotNeeded; --- End diff -- Sorry about that I am still a bit new at the internal terminology. I will update it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user fpj commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r98337403 --- Diff: src/java/test/org/apache/zookeeper/server/quorum/Zab1_0Test.java --- @@ -839,6 +839,13 @@ public void converseWithFollower(InputArchive ia, OutputArchive oa, Assert.assertEquals(1, f.self.getAcceptedEpoch()); Assert.assertEquals(1, f.self.getCurrentEpoch()); +//Wait for the edits to be written out --- End diff -- I need to think some more whether it makes any sense to add test cases for this. The test cases we already have probably cover this enough given that there is no real change of behavior. This change here is necessary, though. We don't really care about time in general in our tests because we can never be sure of the timing we will get across runs and with different settings. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
Github user fpj commented on a diff in the pull request: https://github.com/apache/zookeeper/pull/157#discussion_r98336253 --- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java --- @@ -364,10 +367,12 @@ else if (qp.getType() == Leader.SNAP) { long lastQueued = 0; -// in V1.0 we take a snapshot when we get the NEWLEADER message, but in pre V1.0 +// in Zab V1.0 (ZK 3.4+) we might take a snapshot when we get the NEWLEADER message, but in pre V1.0 // we take the snapshot at the UPDATE, since V1.0 also gets the UPDATE (after the NEWLEADER) // we need to make sure that we don't take the snapshot twice. -boolean snapshotTaken = false; +boolean isPreZAB1_0 = true; +//If we are not going to take the snapshot be sure the edits are not applied in memory +boolean writeToEditLog = !snapshotNeeded; --- End diff -- The changes here are using `edit` to refer to `txns`. I'd rather use `txn` to be consistent across the project. Specifically here, you're using `EditLog` to refer to the `TxnLog`, please change accordingly to have it consistent across the project. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] zookeeper pull request #157: ZOOKEEPER-2678: Discovery and Sync can take a v...
GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/157 ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DB This patch addresses recovery time when a leader is lost on a large DB. It does this by not clearing the DB before leader election begins, and by avoiding taking a snapshot as part of the SYNC phase, specifically for a DIFF sync. It does this by buffering the proposals and commits just like the code currently does for proposals/commits sent after the NEWLEADER and before the UPTODATE messages. If a SNAP is sent we cannot avoid writing out the full snapshot because there is no other way to make sure the disk DB is in sync with what is in memory. So any edits to the edit log before a background snapshot happened could possibly be applied on top of an incorrect snapshot. This same optimization should work for TRUNC too, but I opted not to do it for TRUNC because TRUNC is rare and TRUNC by its very nature already forces the DB to be reread after the edit logs are modified. So it would still not be fast. In practice this makes it so instead of taking 5+ mins for the cluster to recover from losing a leader it now takes about 3 seconds. I am happy to port this to 3.5. if it looks good. You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/157.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #157 commit 5aa25620e0189b28d7040305272be2fda28126fb Author: Robert (Bobby) Evans Date: 2017-01-19T19:50:32Z ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---