[ https://issues.apache.org/jira/browse/SOLR-8226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Timothy Potter reassigned SOLR-8226: ------------------------------------ Assignee: (was: Timothy Potter) > Is a SocketTimeoutException really a reliable indicator of a zombie in > LBHttpSolrClient? > ---------------------------------------------------------------------------------------- > > Key: SOLR-8226 > URL: https://issues.apache.org/jira/browse/SOLR-8226 > Project: Solr > Issue Type: Improvement > Components: SolrJ > Reporter: Timothy Potter > Priority: Major > > In LBHttpSolrClient, we do: > {code} > } catch (SocketTimeoutException e) { > 395 if (!isUpdate) { > 396 ex = (!isZombie) ? addZombie(client, e) : e; > 397 } else { > 398 throw e; > 399 } > {code} > If I have a reasonably low socket timeout configured for my > HttpShardHandlerFactory and we hit a slow query, then a perfectly healthy > replica gets put into the zombie list, and potentially creating a herd effect > on my other replicas as there is now one less replica in the rotation. > Moreover, HttpShardHandlerFactory does not let me configure the check > interval for adding zombies back into rotation, so a potentially healthy > replica is out of rotation for a full minute. At the very least, the interval > should be configurable for the HttpShardHandlerFactory, but we should also > strive to differentiate between a slow response and a true zombie. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org