Mark Miller created SOLR-3782:
---------------------------------

             Summary: A leader going down while updates are coming in can cause 
shard inconsistency.
                 Key: SOLR-3782
                 URL: https://issues.apache.org/jira/browse/SOLR-3782
             Project: Solr
          Issue Type: Bug
          Components: SolrCloud
            Reporter: Mark Miller
            Assignee: Mark Miller
             Fix For: 4.0, 5.0


Harpoon into the head of the great whale I have been chasing for a couple weeks 
now.

ChaosMonkey test was exposing this.

Turns out the problem was the solr cmd distrib executor - when closing the 
leader CoreContainer, we would close the zkController while updates can still 
flow through the distrib executor. The result was that we would send updates 
from the leader briefly even though there was a new leader.

I had suspected something similar to this at one point in the hunt and started 
adding some defensive state checks that we wanted to add anyway. I don't think 
they caught all of this issue due to the limited tightness one of the state 
checks can get to (checking the cloudstate leader from a replica against the 
leader indicated by the request).

So the answer is to finally work out how to stop the solr cmd distrib executor 
- because we need to stop it before closing zkController and giving up our role 
as leader.

I've worked that all out and the issue no longer seems to be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to