[
https://issues.apache.org/jira/browse/ZOOKEEPER-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861379#comment-15861379
]
ASF GitHub Bot commented on ZOOKEEPER-2678:
-------------------------------------------
Github user revans2 commented on a diff in the pull request:
https://github.com/apache/zookeeper/pull/159#discussion_r100552192
--- Diff: src/java/main/org/apache/zookeeper/server/quorum/Learner.java ---
@@ -498,14 +504,19 @@ else if (qp.getType() == Leader.SNAP) {
throw new Exception("changes proposed in
reconfig");
}
}
- if (!snapshotTaken) { // true for the pre v1.0 case
- zk.takeSnapshot();
+ if (isPreZAB1_0) {
+ zk.takeSnapshot();
self.setCurrentEpoch(newEpoch);
}
self.setZooKeeperServer(zk);
self.adminServer.setZooKeeperServer(zk);
break outerLoop;
- case Leader.NEWLEADER: // it will be NEWLEADER in v1.0
+ case Leader.NEWLEADER: // Getting NEWLEADER here instead
of in discovery
+ // means this is Zab 1.0
+ // Create updatingEpoch file and remove it after
current
--- End diff --
You are right will fix that.
> Large databases take a long time to regain a quorum
> ---------------------------------------------------
>
> Key: ZOOKEEPER-2678
> URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2678
> Project: ZooKeeper
> Issue Type: Bug
> Components: server
> Affects Versions: 3.4.9, 3.5.2
> Reporter: Robert Joseph Evans
> Assignee: Robert Joseph Evans
>
> I know this is long but please here me out.
> I recently inherited a massive zookeeper ensemble. The snapshot is 3.4 GB on
> disk. Because of its massive size we have been running into a number of
> issues. There are lots of problems that we hope to fix with tuning GC etc,
> but the big one right now that is blocking us making a lot of progress on the
> rest of them is that when we lose a quorum because the leader left, for what
> ever reason, it can take well over 5 mins for a new quorum to be established.
> So we cannot tune the leader without risking downtime.
> We traced down where the time was being spent and found that each server was
> clearing the database so it would be read back in again before leader
> election even started. Then as part of the sync phase each server will write
> out a snapshot to checkpoint the progress it made as part of the sync.
> I will be putting up a patch shortly with some proposed changes in it.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)