GitHub user revans2 opened a pull request: https://github.com/apache/zookeeper/pull/157
ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DB This patch addresses recovery time when a leader is lost on a large DB. It does this by not clearing the DB before leader election begins, and by avoiding taking a snapshot as part of the SYNC phase, specifically for a DIFF sync. It does this by buffering the proposals and commits just like the code currently does for proposals/commits sent after the NEWLEADER and before the UPTODATE messages. If a SNAP is sent we cannot avoid writing out the full snapshot because there is no other way to make sure the disk DB is in sync with what is in memory. So any edits to the edit log before a background snapshot happened could possibly be applied on top of an incorrect snapshot. This same optimization should work for TRUNC too, but I opted not to do it for TRUNC because TRUNC is rare and TRUNC by its very nature already forces the DB to be reread after the edit logs are modified. So it would still not be fast. In practice this makes it so instead of taking 5+ mins for the cluster to recover from losing a leader it now takes about 3 seconds. I am happy to port this to 3.5. if it looks good. You can merge this pull request into a Git repository by running: $ git pull https://github.com/revans2/zookeeper ZOOKEEPER-2678 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/zookeeper/pull/157.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #157 ---- commit 5aa25620e0189b28d7040305272be2fda28126fb Author: Robert (Bobby) Evans <ev...@yahoo-inc.com> Date: 2017-01-19T19:50:32Z ZOOKEEPER-2678: Discovery and Sync can take a very long time on large DBs ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---