[jira] [Commented] (SOLR-3782) A leader going down while updates are coming in can cause shard inconsistency.

2012-09-04 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447644#comment-13447644
 ] 

Markus Jelsma commented on SOLR-3782:
-

hi - this is in CHANGES but is it resolved?

 A leader going down while updates are coming in can cause shard inconsistency.
 --

 Key: SOLR-3782
 URL: https://issues.apache.org/jira/browse/SOLR-3782
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0, 5.0


 Harpoon into the head of the great whale I have been chasing for a couple 
 weeks now.
 ChaosMonkey test was exposing this.
 Turns out the problem was the solr cmd distrib executor - when closing the 
 leader CoreContainer, we would close the zkController while updates can still 
 flow through the distrib executor. The result was that we would send updates 
 from the leader briefly even though there was a new leader.
 I had suspected something similar to this at one point in the hunt and 
 started adding some defensive state checks that we wanted to add anyway. I 
 don't think they caught all of this issue due to the limited tightness one of 
 the state checks can get to (checking the cloudstate leader from a replica 
 against the leader indicated by the request).
 So the answer is to finally work out how to stop the solr cmd distrib 
 executor - because we need to stop it before closing zkController and giving 
 up our role as leader.
 I've worked that all out and the issue no longer seems to be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3782) A leader going down while updates are coming in can cause shard inconsistency.

2012-09-04 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13447721#comment-13447721
 ] 

Mark Miller commented on SOLR-3782:
---

I committed a first iteration - I have more refinement/improvement coming. 

 A leader going down while updates are coming in can cause shard inconsistency.
 --

 Key: SOLR-3782
 URL: https://issues.apache.org/jira/browse/SOLR-3782
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0, 5.0


 Harpoon into the head of the great whale I have been chasing for a couple 
 weeks now.
 ChaosMonkey test was exposing this.
 Turns out the problem was the solr cmd distrib executor - when closing the 
 leader CoreContainer, we would close the zkController while updates can still 
 flow through the distrib executor. The result was that we would send updates 
 from the leader briefly even though there was a new leader.
 I had suspected something similar to this at one point in the hunt and 
 started adding some defensive state checks that we wanted to add anyway. I 
 don't think they caught all of this issue due to the limited tightness one of 
 the state checks can get to (checking the cloudstate leader from a replica 
 against the leader indicated by the request).
 So the answer is to finally work out how to stop the solr cmd distrib 
 executor - because we need to stop it before closing zkController and giving 
 up our role as leader.
 I've worked that all out and the issue no longer seems to be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3782) A leader going down while updates are coming in can cause shard inconsistency.

2012-09-04 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448123#comment-13448123
 ] 

Mark Miller commented on SOLR-3782:
---

I only solved the issue when stopping the leader - there was also a similar 
issue on session expiration (the leaders update queue could to be emptying as 
we elect a new leader and beyond). I fixed this as well by shutting down the 
executor on session expiration and creating a new one for further use.

 A leader going down while updates are coming in can cause shard inconsistency.
 --

 Key: SOLR-3782
 URL: https://issues.apache.org/jira/browse/SOLR-3782
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0, 5.0


 Harpoon into the head of the great whale I have been chasing for a couple 
 weeks now.
 ChaosMonkey test was exposing this.
 Turns out the problem was the solr cmd distrib executor - when closing the 
 leader CoreContainer, we would close the zkController while updates can still 
 flow through the distrib executor. The result was that we would send updates 
 from the leader briefly even though there was a new leader.
 I had suspected something similar to this at one point in the hunt and 
 started adding some defensive state checks that we wanted to add anyway. I 
 don't think they caught all of this issue due to the limited tightness one of 
 the state checks can get to (checking the cloudstate leader from a replica 
 against the leader indicated by the request).
 So the answer is to finally work out how to stop the solr cmd distrib 
 executor - because we need to stop it before closing zkController and giving 
 up our role as leader.
 I've worked that all out and the issue no longer seems to be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3782) A leader going down while updates are coming in can cause shard inconsistency.

2012-09-04 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13448124#comment-13448124
 ] 

Mark Miller commented on SOLR-3782:
---

Have not committed yet though. Coming soon...

 A leader going down while updates are coming in can cause shard inconsistency.
 --

 Key: SOLR-3782
 URL: https://issues.apache.org/jira/browse/SOLR-3782
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 4.0, 5.0


 Harpoon into the head of the great whale I have been chasing for a couple 
 weeks now.
 ChaosMonkey test was exposing this.
 Turns out the problem was the solr cmd distrib executor - when closing the 
 leader CoreContainer, we would close the zkController while updates can still 
 flow through the distrib executor. The result was that we would send updates 
 from the leader briefly even though there was a new leader.
 I had suspected something similar to this at one point in the hunt and 
 started adding some defensive state checks that we wanted to add anyway. I 
 don't think they caught all of this issue due to the limited tightness one of 
 the state checks can get to (checking the cloudstate leader from a replica 
 against the leader indicated by the request).
 So the answer is to finally work out how to stop the solr cmd distrib 
 executor - because we need to stop it before closing zkController and giving 
 up our role as leader.
 I've worked that all out and the issue no longer seems to be a problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org