[ https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864475#comment-13864475 ]
Mark Miller commented on SOLR-5615: ----------------------------------- Even with the other changes, I like the idea of using a background thread because I don't think it's right that we do that whole reconnect process before we set that we are connected to zk and get out of the connection manager. I really don't think that process should hold up the connection manager at all - it's meant to just trigger it. > Deadlock while trying to recover after a ZK session expiry > ---------------------------------------------------------- > > Key: SOLR-5615 > URL: https://issues.apache.org/jira/browse/SOLR-5615 > Project: Solr > Issue Type: Bug > Components: SolrCloud > Affects Versions: 4.4, 4.5, 4.6 > Reporter: Ramkumar Aiyengar > Assignee: Mark Miller > Fix For: 5.0, 4.7, 4.6.1 > > Attachments: SOLR-5615.patch, SOLR-5615.patch > > > The sequence of events which might trigger this is as follows: > - Leader of a shard, say OL, has a ZK expiry > - The new leader, NL, starts the election process > - NL, through Overseer, clears the current leader (OL) for the shard from > the cluster state > - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread) > - OL marks itself down > - OL sets up watches for cluster state, and then retrieves it (with no > leader for this shard) > - NL, through Overseer, updates cluster state to mark itself leader for the > shard > - OL tries to register itself as a replica, and waits till the cluster state > is updated > with the new leader from event thread > - ZK sends a watch update to OL, but it is blocked on the event thread > waiting for it. > Oops. This finally breaks out after trying to register itself as replica > times out after 20 mins. -- This message was sent by Atlassian JIRA (v6.1.5#6160) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org