I can talk a little about joinAtHead. That was put in there to work
with "preferredLeader". Essentially the idea here is that when a node
registers into the leader election queue, if it has the
preferredLeader flag set it should watch the current leader instead of
joining at the end of the queue.

There's also logic in the leader election process whereby each node
asks "Should I be the next leader"? One consequence of the joinAtHead
is that if that's true, two nodes can be watching the current leader
and both receive events if the leader goes away. So there's some
complexity around "If I might be leader and another node was watching
the leader node too, which of us should win?"

preferredLeader is the use-case this was put in for, but it could be
used for any generalized use-case that required controlling whether a
node should cut into the queue at the head.

I'll leave replacement to someone who knows about it.

Erick

On Thu, Feb 11, 2016 at 10:11 AM, Scott Blum <[email protected]> wrote:
> Hi dev,
>
> Who's the best person to ask questions about the design of LeaderElector and
> ElectionContext?
>
> I ask because I've found it to be somewhat brittle in practice.  During a
> rolling restart, it's not uncommon to get into a state where there's no
> Overseer.  I've even experienced this locally with as few as two nodes.
> When this happens, I've tried (for example) deleting all the children under
> /solr/overseer_elect/election.  In theory, this should trigger all watches
> on all nodes, forcing everyone to re-register and contend for leadership,
> but in practice I haven't found this to work.
>
> I've been diving into the LeaderElection code, and it seems much more
> complicated than I would have expected.  Can anyone give me the theory of
> operation, especially around the joinAtHead and replacement flags?
>
> Thanks!
> Scott
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to