[
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13867165#comment-13867165
]
Shawn Heisey commented on SOLR-5615:
------------------------------------
Noted while backporting SOLR-5543 to the 4.6 branch: In the trunk CHANGES.txt
file for trunk, this issue number shows up in the 4.6.1 section, but does not
appear to have been actually backported to the 4.6 branch yet.
> Deadlock while trying to recover after a ZK session expiry
> ----------------------------------------------------------
>
> Key: SOLR-5615
> URL: https://issues.apache.org/jira/browse/SOLR-5615
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 4.4, 4.5, 4.6
> Reporter: Ramkumar Aiyengar
> Assignee: Mark Miller
> Fix For: 5.0, 4.7, 4.6.1
>
> Attachments: SOLR-5615.patch, SOLR-5615.patch, SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
> - Leader of a shard, say OL, has a ZK expiry
> - The new leader, NL, starts the election process
> - NL, through Overseer, clears the current leader (OL) for the shard from
> the cluster state
> - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
> - OL marks itself down
> - OL sets up watches for cluster state, and then retrieves it (with no
> leader for this shard)
> - NL, through Overseer, updates cluster state to mark itself leader for the
> shard
> - OL tries to register itself as a replica, and waits till the cluster state
> is updated
> with the new leader from event thread
> - ZK sends a watch update to OL, but it is blocked on the event thread
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica
> times out after 20 mins.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]