[jira] [Commented] (SOLR-3859) SolrCloud admin graph is showing leader as state recovery failed, but it's working

2012-11-20 Thread Po Rui (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13501663#comment-13501663
 ] 

Po Rui commented on SOLR-3859:
--

this is a big problem. we also encounter this problem frequently


> SolrCloud admin graph is showing leader as state recovery failed, but it's 
> working
> --
>
> Key: SOLR-3859
> URL: https://issues.apache.org/jira/browse/SOLR-3859
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-BETA
> Environment: linux/centos
>Reporter: Jim Musil
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
> Attachments: zkAdminScreen.PNG, zkDump.txt
>
>
> I'm not sure this is truly a bug, but the behavior really confuses me.
> I have four servers running one of my cores. As a test, I took down the 
> leader to watch how leader election works. In this case, a leader was 
> selected, but it went into a state of "recovery failed". The odd thing is 
> that everything still works. I can query that box directly and I can query 
> the cluster and I observe correct behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3859) SolrCloud admin graph is showing leader as state recovery failed, but it's working

2012-09-19 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459293#comment-13459293
 ] 

Mark Miller commented on SOLR-3859:
---

Did you refresh the page? It does not auto refresh. I assume you did though.

There is an option to dump all the zk state to the clipboard - can you attach 
that dump here?

> SolrCloud admin graph is showing leader as state recovery failed, but it's 
> working
> --
>
> Key: SOLR-3859
> URL: https://issues.apache.org/jira/browse/SOLR-3859
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-BETA
> Environment: linux/centos
>Reporter: Jim Musil
>
> I'm not sure this is truly a bug, but the behavior really confuses me.
> I have four servers running one of my cores. As a test, I took down the 
> leader to watch how leader election works. In this case, a leader was 
> selected, but it went into a state of "recovery failed". The odd thing is 
> that everything still works. I can query that box directly and I can query 
> the cluster and I observe correct behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3859) SolrCloud admin graph is showing leader as state recovery failed, but it's working

2012-09-20 Thread Jim Musil (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459653#comment-13459653
 ] 

Jim Musil commented on SOLR-3859:
-

Ok, attached.

> SolrCloud admin graph is showing leader as state recovery failed, but it's 
> working
> --
>
> Key: SOLR-3859
> URL: https://issues.apache.org/jira/browse/SOLR-3859
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-BETA
> Environment: linux/centos
>Reporter: Jim Musil
>Assignee: Mark Miller
> Attachments: zkDump.txt
>
>
> I'm not sure this is truly a bug, but the behavior really confuses me.
> I have four servers running one of my cores. As a test, I took down the 
> leader to watch how leader election works. In this case, a leader was 
> selected, but it went into a state of "recovery failed". The odd thing is 
> that everything still works. I can query that box directly and I can query 
> the cluster and I observe correct behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3859) SolrCloud admin graph is showing leader as state recovery failed, but it's working

2012-09-25 Thread Jim Musil (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3859?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462867#comment-13462867
 ] 

Jim Musil commented on SOLR-3859:
-

I think this is occurring when two nodes are killed simultaneously. This 
happened on ec2, so it's not that uncommon to take multiple servers down at 
once. My theory is that when a leader goes down, a new leader is chosen, but if 
the new leader has also gone down, then the remaining nodes cannot recover 
properly. Then, if the node that failed to recover is elected leader, it stays 
in that same "recovery failed" state.





> SolrCloud admin graph is showing leader as state recovery failed, but it's 
> working
> --
>
> Key: SOLR-3859
> URL: https://issues.apache.org/jira/browse/SOLR-3859
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-BETA
> Environment: linux/centos
>Reporter: Jim Musil
>Assignee: Mark Miller
> Attachments: zkAdminScreen.PNG, zkDump.txt
>
>
> I'm not sure this is truly a bug, but the behavior really confuses me.
> I have four servers running one of my cores. As a test, I took down the 
> leader to watch how leader election works. In this case, a leader was 
> selected, but it went into a state of "recovery failed". The odd thing is 
> that everything still works. I can query that box directly and I can query 
> the cluster and I observe correct behavior.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org