Hi dev,

Who's the best person to ask questions about the design of LeaderElector
and ElectionContext?

I ask because I've found it to be somewhat brittle in practice.  During a
rolling restart, it's not uncommon to get into a state where there's no
Overseer.  I've even experienced this locally with as few as two nodes.
When this happens, I've tried (for example) deleting all the children under
/solr/overseer_elect/election.  In theory, this should trigger all watches
on all nodes, forcing everyone to re-register and contend for leadership,
but in practice I haven't found this to work.

I've been diving into the LeaderElection code, and it seems much more
complicated than I would have expected.  Can anyone give me the theory of
operation, especially around the joinAtHead and replacement flags?

Thanks!
Scott

Reply via email to