[ 
https://issues.apache.org/jira/browse/SOLR-16932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18081032#comment-18081032
 ] 

Mark Robert Miller commented on SOLR-16932:
-------------------------------------------

Oh I wouldn't change it, that's a can of worms, but ideally you wouldn't 
throttle it globally like this, you’d do it per destination and you'd base the 
limit number on the configured max request per destion queue size of the client 
- with its purpose being to not overflow the queue and see rejections. So each 
shard has its own queue, and you are not just throttling based on a random 
number and it automatically scales with the number of shards. That will would 
work pretty much ideally for whatever cluster size, and you wouldn't make it 
configurable, you'd just make the max queued request per destination of the 
client configurable. But the default of 1024 would likely be good for almost 
all use cases. 

The calculus is a bit different for the user client use case though, because of 
updates. But that's how it should work in the internal scatter gather use case. 

> Http2Client should have configurable `maxOutstandingRequests`, to support 
> parallel requests in high-shard-count contexts
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-16932
>                 URL: https://issues.apache.org/jira/browse/SOLR-16932
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 9.3, 10.0
>            Reporter: Michael Gibney
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Http2SolrClient is asynchronous, but it only allows for a 
> [hardcoded|https://github.com/apache/solr/blob/88990d640a89091a8f7b0b2493377ac24118afe8/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L964]
>  max number (1000) of outstanding requests. Thus, under sufficient load, 
> intra-cluster communication is not fully concurrent/asynchronous, and the 
> top-level coordinator node can become a bottleneck. This is especially 
> problematic for high-shard-count collections (>1k shards) where a single 
> top-level request easily generates sufficient load to hit this throttling, 
> effectively guaranteeing a near doubling of top-level (client-side) request 
> latency.
> It should be possible to configure this {{maxOutstandingConnections}} 
> threshold via the HttpShardHandlerFactory config. If I understand correctly 
> the implications of this limit, it should be reasonable to scale it roughly 
> according to the number of nodes in the cluster (consider, e.g.: 1k 
> outstanding requests to 2 nodes is a very different situation than 1k 
> outstanding requests to 128 nodes).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to