Thanks, Erick.  What can you tell me about this bit?  I'm having trouble
making sense of it.

if (seq <= intSeqs.get(0)) {
  if (seq == intSeqs.get(0) &&
!context.leaderSeqPath.equals(holdElectionPath + "/" + seqs.get(0)))
{//somebody else already  became the leader with the same sequence id
, not me
    log.info("was going to be leader {} , seq(0) {}",
context.leaderSeqPath, holdElectionPath + "/" + seqs.get(0));//but
someone else jumped the line

    // The problem is that deleting the ZK node that's watched by others
    // results in an unpredictable sequencing of the events and
sometime the context that comes in for checking
    // this happens to be after the node has already taken over
leadership. So just leave out of here.
    // This caused one of the tests to fail on having two nodes with
the same name in the queue. I'm not sure
    // the assumption that this is a bad state is valid.
    if (getNodeName(context.leaderSeqPath).equals(getNodeName(seqs.get(0)))) {
      return;
    }
    retryElection(context, false);//join at the tail again
    return;
  }


We ran into this message in our logs, but it was related to Overseer
election, where there shouldn't have been a preferredLeader.

I'm struggling to put together the right mental model, the code is really
hard for me follow.

On Thu, Feb 11, 2016 at 1:19 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> I can talk a little about joinAtHead. That was put in there to work
> with "preferredLeader". Essentially the idea here is that when a node
> registers into the leader election queue, if it has the
> preferredLeader flag set it should watch the current leader instead of
> joining at the end of the queue.
>
> There's also logic in the leader election process whereby each node
> asks "Should I be the next leader"? One consequence of the joinAtHead
> is that if that's true, two nodes can be watching the current leader
> and both receive events if the leader goes away. So there's some
> complexity around "If I might be leader and another node was watching
> the leader node too, which of us should win?"
>
> preferredLeader is the use-case this was put in for, but it could be
> used for any generalized use-case that required controlling whether a
> node should cut into the queue at the head.
>
> I'll leave replacement to someone who knows about it.
>
> Erick
>
> On Thu, Feb 11, 2016 at 10:11 AM, Scott Blum <dragonsi...@gmail.com>
> wrote:
> > Hi dev,
> >
> > Who's the best person to ask questions about the design of LeaderElector
> and
> > ElectionContext?
> >
> > I ask because I've found it to be somewhat brittle in practice.  During a
> > rolling restart, it's not uncommon to get into a state where there's no
> > Overseer.  I've even experienced this locally with as few as two nodes.
> > When this happens, I've tried (for example) deleting all the children
> under
> > /solr/overseer_elect/election.  In theory, this should trigger all
> watches
> > on all nodes, forcing everyone to re-register and contend for leadership,
> > but in practice I haven't found this to work.
> >
> > I've been diving into the LeaderElection code, and it seems much more
> > complicated than I would have expected.  Can anyone give me the theory of
> > operation, especially around the joinAtHead and replacement flags?
> >
> > Thanks!
> > Scott
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

Reply via email to