Ah, I was really responding to shard leader election. Part of your
confusion may be that the same code is re-used for both overseer
election and leader election.....

I really can't comment too much on the overseer election process.
Generically, though, when I was mentioning earlier that the "join at
head" meant that two nodes are watching the current leader. One of
them should win the election, and the other one spits out this
message.

Best,
Erick

On Thu, Feb 11, 2016 at 12:53 PM, Scott Blum <dragonsi...@gmail.com> wrote:
> Thanks, Erick.  What can you tell me about this bit?  I'm having trouble
> making sense of it.
>
> if (seq <= intSeqs.get(0)) {
>   if (seq == intSeqs.get(0) &&
> !context.leaderSeqPath.equals(holdElectionPath + "/" + seqs.get(0)))
> {//somebody else already  became the leader with the same sequence id , not
> me
>     log.info("was going to be leader {} , seq(0) {}", context.leaderSeqPath,
> holdElectionPath + "/" + seqs.get(0));//but someone else jumped the line
>
>     // The problem is that deleting the ZK node that's watched by others
>     // results in an unpredictable sequencing of the events and sometime the
> context that comes in for checking
>     // this happens to be after the node has already taken over leadership.
> So just leave out of here.
>     // This caused one of the tests to fail on having two nodes with the
> same name in the queue. I'm not sure
>     // the assumption that this is a bad state is valid.
>     if (getNodeName(context.leaderSeqPath).equals(getNodeName(seqs.get(0))))
> {
>       return;
>     }
>     retryElection(context, false);//join at the tail again
>     return;
>   }
>
>
> We ran into this message in our logs, but it was related to Overseer
> election, where there shouldn't have been a preferredLeader.
>
> I'm struggling to put together the right mental model, the code is really
> hard for me follow.
>
> On Thu, Feb 11, 2016 at 1:19 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>>
>> I can talk a little about joinAtHead. That was put in there to work
>> with "preferredLeader". Essentially the idea here is that when a node
>> registers into the leader election queue, if it has the
>> preferredLeader flag set it should watch the current leader instead of
>> joining at the end of the queue.
>>
>> There's also logic in the leader election process whereby each node
>> asks "Should I be the next leader"? One consequence of the joinAtHead
>> is that if that's true, two nodes can be watching the current leader
>> and both receive events if the leader goes away. So there's some
>> complexity around "If I might be leader and another node was watching
>> the leader node too, which of us should win?"
>>
>> preferredLeader is the use-case this was put in for, but it could be
>> used for any generalized use-case that required controlling whether a
>> node should cut into the queue at the head.
>>
>> I'll leave replacement to someone who knows about it.
>>
>> Erick
>>
>> On Thu, Feb 11, 2016 at 10:11 AM, Scott Blum <dragonsi...@gmail.com>
>> wrote:
>> > Hi dev,
>> >
>> > Who's the best person to ask questions about the design of LeaderElector
>> > and
>> > ElectionContext?
>> >
>> > I ask because I've found it to be somewhat brittle in practice.  During
>> > a
>> > rolling restart, it's not uncommon to get into a state where there's no
>> > Overseer.  I've even experienced this locally with as few as two nodes.
>> > When this happens, I've tried (for example) deleting all the children
>> > under
>> > /solr/overseer_elect/election.  In theory, this should trigger all
>> > watches
>> > on all nodes, forcing everyone to re-register and contend for
>> > leadership,
>> > but in practice I haven't found this to work.
>> >
>> > I've been diving into the LeaderElection code, and it seems much more
>> > complicated than I would have expected.  Can anyone give me the theory
>> > of
>> > operation, especially around the joinAtHead and replacement flags?
>> >
>> > Thanks!
>> > Scott
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to