[ 
https://issues.apache.org/jira/browse/SOLR-7571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-7571.
----------------------------------
    Resolution: Duplicate

SOLR-7344 is a much better approach, Solr should survive ill-mannered clients.

> Return metrics with update requests to allow clients to self-throttle
> ---------------------------------------------------------------------
>
>                 Key: SOLR-7571
>                 URL: https://issues.apache.org/jira/browse/SOLR-7571
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 4.10.3
>            Reporter: Erick Erickson
>            Assignee: Erick Erickson
>
> I've assigned this to myself to keep track of it, anyone who wants please 
> feel free to take this.
> I've recently seen a setup with 10 shards and 4 replicas. The SolrJ client 
> (and post.jar for json files for that matter) firehose updates (150 separate 
> threads in total) at Solr. Eventually, replicas (not leaders) go into 
> recovery and the state cascades and eventually the entire cluster becomes 
> unusable. SOLR-5850 delays the behavior, but it still occurs. There are no 
> errors in the follower's logs this is leader-initiated-recovery because of a 
> timeout.
> I think the root problem is that the client is just sending too many requests 
> to the cluster, and ConcurrentUpdateSolrClient/Server (used by the leader to 
> distribute update requests to all the followers) (this was observed in Solr 
> 4.10.3+).  I see thread counts of 500+ when this happens.
> So assuming that this is the root cause, the obvious "cure" is "don't index 
> that fast". This is unsatisfactory since "that fast" is variable, the only 
> recourse is to set that threshold low enough that the Solr cluster isn't 
> being driven as fast is it can be.
> We should provide some mechanism for having the client throttle itself. The 
> number of outstanding update threads is one possibility. The client could 
> then slow down sending updates to Solr. 
> I'm not sure there's a good way to deal with this on the server. Once the 
> timeout is encountered, you don't know whether the doc has actually been 
> indexed on the follower (actually, in this case it _is_ indexed, it just take 
> a while). Ideally we'd just manage it all magically, but an alternative to 
> let clients dynamically throttle themselves seems do-able.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to