[jira] [Commented] (SOLR-6763) Shard leader election thread can persist across connection loss

Mark Miller (JIRA) Thu, 20 Nov 2014 08:13:03 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14219535#comment-14219535
 ]


Mark Miller commented on SOLR-6763:
-----------------------------------

bq. and another spawned by the ReconnectStrategy. 

Hmm...this sounds fishy. We should not be spawning any new election thread on 
ConnectionLoss - only on Expiration.

> Shard leader election thread can persist across connection loss
> ---------------------------------------------------------------
>
>                 Key: SOLR-6763
>                 URL: https://issues.apache.org/jira/browse/SOLR-6763
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Alan Woodward
>         Attachments: SOLR-6763.patch
>
>
> A ZK connection loss during a call to 
> ElectionContext.waitForReplicasToComeUp() will result in two leader election 
> processes for the shard running within a single node - the initial election 
> that was waiting, and another spawned by the ReconnectStrategy.  After the 
> function returns, the first election will create an ephemeral leader node.  
> The second election will then also attempt to create this node, fail, and try 
> to put itself into recovery.  It will also set the 'isLeader' value in its 
> CloudDescriptor to false.
> The first election, meanwhile, is happily maintaining the ephemeral leader 
> node.  But any updates that are sent to the shard will cause an exception due 
> to the mismatch between the cloudstate (where this node is the leader) and 
> the local CloudDescriptor leader state.
> I think the fix is straightfoward - the call to zkClient.getChildren() in 
> waitForReplicasToComeUp should be called with 'retryOnReconnect=false', 
> rather than 'true' as it is currently, because once the connection has 
> dropped we're going to launch a new election process anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6763) Shard leader election thread can persist across connection loss

Reply via email to