Thanks! I somewhat remember seeing that conversation but I confess I didn't follow it that closely.
I can't cope with looking at it any more tonight, but I'll check in the morning. The problem I see is I don't think there's any way, once a node is re-inserted in the queue, for another node to figure out that it's not supposed to be the leader if it's first in line after the nodes are sorted, but I may have missed that. Erick On Tue, Dec 2, 2014 at 5:34 PM, Jessica Mallet <mewmewb...@gmail.com> wrote: > This is reminiscent of my conversation with Noble on this SOLR-6095 starting > at this comment: > https://issues.apache.org/jira/browse/SOLR-6095?focusedCommentId=14032386&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14032386 > > Unfortunately I dropped off following it and my memory is a bit vague right > now. Reading from the comments, I think Noble had in mind that the > tie-breaker can pick the wrong node (n2) to be the leader, but then the > wrong node will then re-initiate the process to renounce leadership and > re-join (according to > https://issues.apache.org/jira/browse/SOLR-6095?focusedCommentId=14032619&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14032619). > > I then asked about when that renounce process will happen for n2 > (https://issues.apache.org/jira/browse/SOLR-6095?focusedCommentId=14032659&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14032659), > and I'm not sure if that was ever specifically answered. Figuring if and how > that happens might be key in moving forward? > > Jessica > > On Tue, Dec 2, 2014 at 4:30 PM, Erick Erickson <erickerick...@gmail.com> > wrote: >> >> I'm particularly interested in Noble and Mark's comments... >> >> Let's say you have 5 nodes in n1, n2, n3, n4, n5. >> >> n1 is the leader, n2 watches n1 etc. >> >> Now I retryElection for n3 with joinAtHead=true. Both n2 and n3 are >> watching n1. So far, so good. >> >> My expectation is that deleting n1 would cause n3 to become leader, >> but it isn't at all guaranteed. I have a test case illustrating this. >> >> Incidentally, I think I should get the same result by calling >> retryElection on n1 with joinAtHead=false; n3 should become the >> leader. >> >> I was working on SOLR-6691 and slowly going crazy since everything I >> was trying would fail. Basically, to rebalance leaders (thanks Noble >> for pointing out how far off I was in my original approach) it seemed >> like it would be sufficient to >> >> 1> have the preferred leader retry the election at the head >> 2> tell the old leader to retry at the tail >> >> I expected the old node that was watching the leader to figure out >> that it wasn't really next in line and re-add itself to the end. >> >> But things went all to hell in a handbasket when I wrote a harness >> that exercised it, and it drove me a bit nuts. Especially since it >> would fail one way one time and another way the next. And it'd even >> succeed upon occasion.... >> >> I figured out that my expectations weren't being met. Due to the way >> leader queues are sorted, if the two sequence numbers are identical >> then the tie-breaker does NOT pick the last node to join at head. It >> picks the one with the lowest (highest? didn't track that down >> entirely) session ID. Either way, sometimes it picks the node newly >> added at the head and sometimes it picks the old one. >> >> If I _am_ on the right path, then I propose the following: >> 1> I'll raise a new JIRA for leader sequence sorting and take it on. >> I'm not quite sure how fix it, the ideas I have are fairly hacky. >> >> 2> I'll back out the REBALANCELEADER stuff. Currently it'll break >> things badly and we're too close to 5.0 to try to do anything about >> <1> IMO. this just means that I'll comment out the collections API >> call in the code and update the ref guide. >> >> 3> When <1> is resolved, I'll put REBALANCELEADERs back in, but that >> won't be before 5.1 >> >> Erick >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: dev-h...@lucene.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org