Thanks, Erick. What can you tell me about this bit? I'm having trouble making sense of it.
if (seq <= intSeqs.get(0)) { if (seq == intSeqs.get(0) && !context.leaderSeqPath.equals(holdElectionPath + "/" + seqs.get(0))) {//somebody else already became the leader with the same sequence id , not me log.info("was going to be leader {} , seq(0) {}", context.leaderSeqPath, holdElectionPath + "/" + seqs.get(0));//but someone else jumped the line // The problem is that deleting the ZK node that's watched by others // results in an unpredictable sequencing of the events and sometime the context that comes in for checking // this happens to be after the node has already taken over leadership. So just leave out of here. // This caused one of the tests to fail on having two nodes with the same name in the queue. I'm not sure // the assumption that this is a bad state is valid. if (getNodeName(context.leaderSeqPath).equals(getNodeName(seqs.get(0)))) { return; } retryElection(context, false);//join at the tail again return; } We ran into this message in our logs, but it was related to Overseer election, where there shouldn't have been a preferredLeader. I'm struggling to put together the right mental model, the code is really hard for me follow. On Thu, Feb 11, 2016 at 1:19 PM, Erick Erickson <erickerick...@gmail.com> wrote: > I can talk a little about joinAtHead. That was put in there to work > with "preferredLeader". Essentially the idea here is that when a node > registers into the leader election queue, if it has the > preferredLeader flag set it should watch the current leader instead of > joining at the end of the queue. > > There's also logic in the leader election process whereby each node > asks "Should I be the next leader"? One consequence of the joinAtHead > is that if that's true, two nodes can be watching the current leader > and both receive events if the leader goes away. So there's some > complexity around "If I might be leader and another node was watching > the leader node too, which of us should win?" > > preferredLeader is the use-case this was put in for, but it could be > used for any generalized use-case that required controlling whether a > node should cut into the queue at the head. > > I'll leave replacement to someone who knows about it. > > Erick > > On Thu, Feb 11, 2016 at 10:11 AM, Scott Blum <dragonsi...@gmail.com> > wrote: > > Hi dev, > > > > Who's the best person to ask questions about the design of LeaderElector > and > > ElectionContext? > > > > I ask because I've found it to be somewhat brittle in practice. During a > > rolling restart, it's not uncommon to get into a state where there's no > > Overseer. I've even experienced this locally with as few as two nodes. > > When this happens, I've tried (for example) deleting all the children > under > > /solr/overseer_elect/election. In theory, this should trigger all > watches > > on all nodes, forcing everyone to re-register and contend for leadership, > > but in practice I haven't found this to work. > > > > I've been diving into the LeaderElection code, and it seems much more > > complicated than I would have expected. Can anyone give me the theory of > > operation, especially around the joinAtHead and replacement flags? > > > > Thanks! > > Scott > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > >