GitHub user revans2 opened a pull request:
https://github.com/apache/zookeeper/pull/157
ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DB
This patch addresses recovery time when a leader is lost on a large DB.
It does this by not clearing the DB before leader election begins, and by
avoiding taking a snapshot as part of the SYNC phase, specifically for a DIFF
sync. It does this by buffering the proposals and commits just like the code
currently does for proposals/commits sent after the NEWLEADER and before the
UPTODATE messages.
If a SNAP is sent we cannot avoid writing out the full snapshot because
there is no other way to make sure the disk DB is in sync with what is in
memory. So any edits to the edit log before a background snapshot happened
could possibly be applied on top of an incorrect snapshot.
This same optimization should work for TRUNC too, but I opted not to do it
for TRUNC because TRUNC is rare and TRUNC by its very nature already forces the
DB to be reread after the edit logs are modified. So it would still not be
fast.
In practice this makes it so instead of taking 5+ mins for the cluster to
recover from losing a leader it now takes about 3 seconds.
I am happy to port this to 3.5. if it looks good.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/zookeeper/pull/157.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #157
----
commit 5aa25620e0189b28d7040305272be2fda28126fb
Author: Robert (Bobby) Evans <[email protected]>
Date: 2017-01-19T19:50:32Z
ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---