As I get deeper into Solr on kube, I've begun to wonder if Solr leader
election on kube is an obsolete concept. Leader election was conceived when
hardware was not fungible. Now that hardware is fungible I wonder if it's
time to rethink the whole idea of leader election.

Consider the following scenario:

A collection where each shard has 1 tlog replica and N pull replicas. A
shard leader goes down, indexing fails on the shard for a period of time,
kube restarts the leader, indexing succeeds on the shard. Pull replicas
continue to accept queries the entire time.

There are three main advantages of this kind of setup:

1) Potential for zero data loss. In this scenario indexing either succeeds
or it fails. We no longer have data loss that comes from a lack of a two
phase commit across a set of tlog or nrt replicas. Now there is only one
shard leader, which has a transaction redo log, and this is much, much
easier to achieve zero data loss.

2) Improved cluster stability.  Restarting the leader is far simpler than
electing a new leader, peer syncing, index finger printing etc... and would
eliminate a whole class of operational issues.

3) The phasing out of nrt, and maybe even leader election in the code base,
greatly decreases the amount of code complexity and allows committers to
harden the eventually consistent model.


Joel Bernstein
http://joelsolr.blogspot.com/

Reply via email to