[ 
https://issues.apache.org/jira/browse/SOLR-7361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14483985#comment-14483985
 ] 

Jessica Cheng Mallet commented on SOLR-7361:
--------------------------------------------

I think I have also seen cases where if we bounced two nodes holding two 
replicas of a particular collection/shard, then they both can't complete their 
recovery because they can't talk to each other. This fixes itself eventually 
when they time out waiting for each other, but before that happens they're 
basically "deadlocked". (Unfortunately I don't have logs to back that up 
anymore, so it's more of an anecdotal account.)

> Main Jetty thread blocked by core loading delays HTTP listener from binding 
> if core loading is slow
> ---------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-7361
>                 URL: https://issues.apache.org/jira/browse/SOLR-7361
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Timothy Potter
>
> During server startup, the CoreContainer uses an ExecutorService to load 
> cores in multiple back-ground threads but then blocks until cores are loaded, 
> see: CoreContainer#load around line 290 on trunk (invokeAll). From the 
> JavaDoc on that method, we have:
> {quote}
> Executes the given tasks, returning a list of Futures holding their status 
> and results when all complete. Future.isDone() is true for each element of 
> the returned list.
> {quote}
> In other words, this is a blocking call.
> This delays the Jetty HTTP listener from binding and accepting requests until 
> all cores are loaded. Do we need to block the main thread?
> Also, prior to this happening, the node is registered as a live node in ZK, 
> which makes it a candidate for receiving requests from the Overseer, such as 
> to service a create collection request. The problem of course is that the 
> node listed in /live_nodes isn't accepting requests yet. So we either need to 
> unblock the main thread during server loading or maybe wait longer before we 
> register as a live node ... not sure which is the better way forward?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to