Erick Erickson created SOLR-7571:
------------------------------------

             Summary: Return metrics with update requests to allow clients to 
self-throttle
                 Key: SOLR-7571
                 URL: https://issues.apache.org/jira/browse/SOLR-7571
             Project: Solr
          Issue Type: Improvement
    Affects Versions: 5.1, Trunk
            Reporter: Erick Erickson
            Assignee: Erick Erickson


I've assigned this to myself to keep track of it, anyone who wants please feel 
free to take this.

I've recently seen a setup with 10 shards and 4 replicas. The SolrJ client (and 
post.jar for json files for that matter) firehose updates (150 separate threads 
in total) at Solr. Eventually, replicas (not leaders) go into recovery and the 
state cascades and eventually the entire cluster becomes unusable. SOLR-5850 
delays the behavior, but it still occurs. There are no errors in the follower's 
logs this is leader-initiated-recovery because of a timeout.

I think the root problem is that the client is just sending too many requests 
to the cluster, and ConcurrentUpdateSolrClient/Server (used by the leader to 
distribute update requests to all the followers) (this was observed in Solr 
4.10.3+).  I see thread counts of 500+ when this happens.

So assuming that this is the root cause, the obvious "cure" is "don't index 
that fast". This is unsatisfactory since "that fast" is variable, the only 
recourse is to set that threshold low enough that the Solr cluster isn't being 
driven as fast is it can be.

We should provide some mechanism for having the client throttle itself. The 
number of outstanding update threads is one possibility. The client could then 
slow down sending updates to Solr. 

I'm not sure there's a good way to deal with this on the server. Once the 
timeout is encountered, you don't know whether the doc has actually been 
indexed on the follower (actually, in this case it _is_ indexed, it just take a 
while). Ideally we'd just manage it all magically, but an alternative to let 
clients dynamically throttle themselves seems do-able.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to