[ 
https://issues.apache.org/jira/browse/SOLR-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter reassigned SOLR-8226:
------------------------------------

    Assignee:     (was: Timothy Potter)

> Is a SocketTimeoutException really a reliable indicator of a zombie in 
> LBHttpSolrClient?
> ----------------------------------------------------------------------------------------
>
>                 Key: SOLR-8226
>                 URL: https://issues.apache.org/jira/browse/SOLR-8226
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrJ
>            Reporter: Timothy Potter
>            Priority: Major
>
> In LBHttpSolrClient, we do:
> {code}
>  } catch (SocketTimeoutException e) {
> 395         if (!isUpdate) {
> 396           ex = (!isZombie) ? addZombie(client, e) : e;
> 397         } else {
> 398           throw e;
> 399         }
> {code}
> If I have a reasonably low socket timeout configured for my 
> HttpShardHandlerFactory and we hit a slow query, then a perfectly healthy 
> replica gets put into the zombie list, and potentially creating a herd effect 
> on my other replicas as there is now one less replica in the rotation. 
> Moreover, HttpShardHandlerFactory does not let me configure the check 
> interval for adding zombies back into rotation, so a potentially healthy 
> replica is out of rotation for a full minute. At the very least, the interval 
> should be configurable for the HttpShardHandlerFactory, but we should also 
> strive to differentiate between a slow response and a true zombie.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to