Timothy Potter created SOLR-8226:
------------------------------------
Summary: Is a SocketTimeoutException really a reliable indicator
of a zombie in LBHttpSolrClient?
Key: SOLR-8226
URL: https://issues.apache.org/jira/browse/SOLR-8226
Project: Solr
Issue Type: Improvement
Components: SolrJ
Reporter: Timothy Potter
Assignee: Timothy Potter
In LBHttpSolrClient, we do:
{code}
} catch (SocketTimeoutException e) {
395 if (!isUpdate) {
396 ex = (!isZombie) ? addZombie(client, e) : e;
397 } else {
398 throw e;
399 }
{code}
If I have a reasonably low socket timeout configured for my
HttpShardHandlerFactory and we hit a slow query, then a perfectly healthy
replica gets put into the zombie list, and potentially creating a herd effect
on my other replicas as there is now one less replica in the rotation.
Moreover, HttpShardHandlerFactory does not let me configure the check interval
for adding zombies back into rotation, so a potentially healthy replica is out
of rotation for a full minute. At the very least, the interval should be
configurable for the HttpShardHandlerFactory, but we should also strive to
differentiate between a slow response and a true zombie.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]