[ https://issues.apache.org/jira/browse/ZOOKEEPER-2678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15854358#comment-15854358 ]
Hadoop QA commented on ZOOKEEPER-2678: -------------------------------------- +1 overall. GitHub Pull Request Build +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs (version 3.0.1) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/281//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/281//artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-github-pr-build/281//console This message is automatically generated. > Large databases take a long time to regain a quorum > --------------------------------------------------- > > Key: ZOOKEEPER-2678 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2678 > Project: ZooKeeper > Issue Type: Bug > Components: server > Affects Versions: 3.4.9, 3.5.2 > Reporter: Robert Joseph Evans > Assignee: Robert Joseph Evans > > I know this is long but please here me out. > I recently inherited a massive zookeeper ensemble. The snapshot is 3.4 GB on > disk. Because of its massive size we have been running into a number of > issues. There are lots of problems that we hope to fix with tuning GC etc, > but the big one right now that is blocking us making a lot of progress on the > rest of them is that when we lose a quorum because the leader left, for what > ever reason, it can take well over 5 mins for a new quorum to be established. > So we cannot tune the leader without risking downtime. > We traced down where the time was being spent and found that each server was > clearing the database so it would be read back in again before leader > election even started. Then as part of the sync phase each server will write > out a snapshot to checkpoint the progress it made as part of the sync. > I will be putting up a patch shortly with some proposed changes in it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)