[ 
https://issues.apache.org/jira/browse/SOLR-5615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13864460#comment-13864460
 ] 

Mark Miller commented on SOLR-5615:
-----------------------------------

bq. However, onReconnect in any case runs in the event thread of the expired ZK 
which wouldn't have events after that, so it's effectively backgrounded?

But it holds the ConnectionManager this lock while it runs right? I think we 
just don't want to hold that lock while it runs. 

I think the other changes are likely okay too - I'm playing around with a 
combination of the two.

> Deadlock while trying to recover after a ZK session expiry
> ----------------------------------------------------------
>
>                 Key: SOLR-5615
>                 URL: https://issues.apache.org/jira/browse/SOLR-5615
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.4, 4.5, 4.6
>            Reporter: Ramkumar Aiyengar
>         Attachments: SOLR-5615.patch
>
>
> The sequence of events which might trigger this is as follows:
>  - Leader of a shard, say OL, has a ZK expiry
>  - The new leader, NL, starts the election process
>  - NL, through Overseer, clears the current leader (OL) for the shard from 
> the cluster state
>  - OL reconnects to ZK, calls onReconnect from event thread (main-EventThread)
>  - OL marks itself down
>  - OL sets up watches for cluster state, and then retrieves it (with no 
> leader for this shard)
>  - NL, through Overseer, updates cluster state to mark itself leader for the 
> shard
>  - OL tries to register itself as a replica, and waits till the cluster state 
> is updated
>    with the new leader from event thread
>  - ZK sends a watch update to OL, but it is blocked on the event thread 
> waiting for it.
> Oops. This finally breaks out after trying to register itself as replica 
> times out after 20 mins.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to