[ 
https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002251#comment-15002251
 ] 

Ishan Chattopadhyaya commented on SOLR-7569:
--------------------------------------------

bq. I've taken a crack at making SOLR-7989 work.
Thanks!

bq. Perhaps the last thing the API should do is run through each shard and see 
if the registered leader is DOWN, and if it is make it ACTIVE (preferably by 
asking it to publish itself as ACTIVE - we don't want to publish for someone 
else). If the call waits around to make sure all the leaders come up, this 
should be simple.
This makes sense. I think this is something that Shalin alluded to (please 
excuse me if I'm mistaken) when he said, {{1. Leader is live but 'down' -> mark 
it 'active'}}. The suggestion for the replicas to mark themselves ACTIVE 
instead of someone else marking them down seems like a good thing to do.

> Create an API to force a leader election between nodes
> ------------------------------------------------------
>
>                 Key: SOLR-7569
>                 URL: https://issues.apache.org/jira/browse/SOLR-7569
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Noble Paul
>              Labels: difficulty-medium, impact-high
>             Fix For: 5.4, Trunk
>
>         Attachments: SOLR-7569-testfix.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, 
> SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all 
> replicas' last published state was recovery or due to bugs which cause a 
> leader to be marked as 'down'. While the best solution is that they never get 
> into this state, we need a manual way to fix this when it does get into this  
> state. Right now we can do a series of dance involving bouncing the node 
> (since recovery paths between bouncing and REQUESTRECOVERY are different), 
> but that is difficult when running a large cluster. Although it is possible 
> that such a manual API may lead to some data loss but in some cases, it is 
> the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force 
> replicas into recovering a leader while avoiding data loss on a best effort 
> basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to