[ 
https://issues.apache.org/jira/browse/SOLR-8697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15153294#comment-15153294
 ] 

Scott Blum commented on SOLR-8697:
----------------------------------

[~markrmil...@gmail.com] [~erickerickson]

I think there is a potential problem with how OverseerTest is constructed, that 
perhaps caused us to write some code into LeaderElector in the past that 
doesn't make any sense for live code.

I'm looking at the implementation of MockZkController.publishState() (it's kind 
of a beast) and I notice that when it creates an ElectionContext, it never 
actually adds it to the map, checks whether one already exists, etc.  As a 
result, MockZkController does something the real ZkController never does -- it 
tries to register two different election contexts for the same core on the same 
ZK session.

My question is, what's the right fix?  I can either make MockZkController not 
setup a new electionContext on subsequent invocations, or I could make it 
simply cancel the previous election context before creating a new one.

> Fix LeaderElector issues
> ------------------------
>
>                 Key: SOLR-8697
>                 URL: https://issues.apache.org/jira/browse/SOLR-8697
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 5.4.1
>            Reporter: Scott Blum
>              Labels: patch, reliability, solrcloud
>
> This patch is still somewhat WIP for a couple of reasons:
> 1) Still debugging test failures.
> 2) This will more scrutiny from knowledgable folks!
> There are some subtle bugs with the current implementation of LeaderElector, 
> best demonstrated by the following test:
> 1) Start up a small single-node solrcloud.  it should be become Overseer.
> 2) kill -9 the solrcloud process and immediately start a new one.
> 3) The new process won't become overseer.  The old process's ZK leader elect 
> node has not yet disappeared, and the new process fails to set appropriate 
> watches.
> NOTE: this is only reproducible if the new node is able to start up and join 
> the election quickly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to