[ 
https://issues.apache.org/jira/browse/SOLR-10904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16191347#comment-16191347
 ] 

Mihaly Toth commented on SOLR-10904:
------------------------------------

[~markrmil...@gmail.com], I will try to put up a patch for this tonight.

> Unnecessary waiting during failover in case of failed core creation
> -------------------------------------------------------------------
>
>                 Key: SOLR-10904
>                 URL: https://issues.apache.org/jira/browse/SOLR-10904
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 7.0
>            Reporter: Mihaly Toth
>            Assignee: Mark Miller
>
> Background failover thread checks for bad replicas. In case one is found it 
> tries to create it on another node. Then it waits for the new replica to show 
> up in the cluster state. It waits even if the core creation (initiated by 
> itself) fails. 
> This situation does not occur on the happy path of the failover cases because 
> the new node was marked as alive. But in case the cluster is in an instable 
> state, or user is restarting the new node, or overseer is overloaded this 
> extra wait will result in holding up this failover thread.
> Proposed solution may be
> # wait for the result of the core creation
> # only if previous step is successful proceed to wait for cluster state change
> In code:
> {code}
> try {
>   Future<Boolean> future = updateExecutor.submit(() -> 
> createSolrCore(collection, createUrl, dataDir, ulogDir, coreNodeName, 
> coreName, shardId));
>   future.get(30000L, TimeUnit.MILLISECONDS);
> } catch (InterruptedException | ExecutionException | TimeoutException e) {
>   log.error("Error creating core", e);
>   return false;
> } finally {
>   MDC.remove("OverseerAutoReplicaFailoverThread.createUrl");
> }
> {code}
> In such case we could consider moving core creation into the failover thread 
> from the updateExecutor.
> I can post a patch with these changes if the solution seems appropriate.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to