I can talk a little about joinAtHead. That was put in there to work with "preferredLeader". Essentially the idea here is that when a node registers into the leader election queue, if it has the preferredLeader flag set it should watch the current leader instead of joining at the end of the queue.
There's also logic in the leader election process whereby each node asks "Should I be the next leader"? One consequence of the joinAtHead is that if that's true, two nodes can be watching the current leader and both receive events if the leader goes away. So there's some complexity around "If I might be leader and another node was watching the leader node too, which of us should win?" preferredLeader is the use-case this was put in for, but it could be used for any generalized use-case that required controlling whether a node should cut into the queue at the head. I'll leave replacement to someone who knows about it. Erick On Thu, Feb 11, 2016 at 10:11 AM, Scott Blum <[email protected]> wrote: > Hi dev, > > Who's the best person to ask questions about the design of LeaderElector and > ElectionContext? > > I ask because I've found it to be somewhat brittle in practice. During a > rolling restart, it's not uncommon to get into a state where there's no > Overseer. I've even experienced this locally with as few as two nodes. > When this happens, I've tried (for example) deleting all the children under > /solr/overseer_elect/election. In theory, this should trigger all watches > on all nodes, forcing everyone to re-register and contend for leadership, > but in practice I haven't found this to work. > > I've been diving into the LeaderElection code, and it seems much more > complicated than I would have expected. Can anyone give me the theory of > operation, especially around the joinAtHead and replacement flags? > > Thanks! > Scott > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
