[ 
https://issues.apache.org/jira/browse/SOLR-16932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18080442#comment-18080442
 ] 

Mark Robert Miller commented on SOLR-16932:
-------------------------------------------

1000 outstanding requests was never really a good limit here. I originally used 
that limit for just a user client and Dat took it from me, and then it become 
an internal limit when his client moved to do the scatter gather search.

Before that, when you did not have async search, you could have as many 
concurrent requests going as threads you could create. Switching to this client 
and async made tons of concurrent requests much more efficient while at the 
same time capping them heavily. If you have a 300 shard cluster, you get a 
little over 3 concurrent requests possible (simplified) after the change. 
That's a pretty big performance regression, though mitigated if you are load 
balancing queries across the cluster.

The limit should have been made much higher IMO. But FYI, it looks like this is 
a configurable system property in Solr 10.

> Http2Client should have configurable `maxOutstandingRequests`, to support 
> parallel requests in high-shard-count contexts
> ------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-16932
>                 URL: https://issues.apache.org/jira/browse/SOLR-16932
>             Project: Solr
>          Issue Type: Improvement
>    Affects Versions: 9.3, 10.0
>            Reporter: Michael Gibney
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Http2SolrClient is asynchronous, but it only allows for a 
> [hardcoded|https://github.com/apache/solr/blob/88990d640a89091a8f7b0b2493377ac24118afe8/solr/solrj/src/java/org/apache/solr/client/solrj/impl/Http2SolrClient.java#L964]
>  max number (1000) of outstanding requests. Thus, under sufficient load, 
> intra-cluster communication is not fully concurrent/asynchronous, and the 
> top-level coordinator node can become a bottleneck. This is especially 
> problematic for high-shard-count collections (>1k shards) where a single 
> top-level request easily generates sufficient load to hit this throttling, 
> effectively guaranteeing a near doubling of top-level (client-side) request 
> latency.
> It should be possible to configure this {{maxOutstandingConnections}} 
> threshold via the HttpShardHandlerFactory config. If I understand correctly 
> the implications of this limit, it should be reasonable to scale it roughly 
> according to the number of nodes in the cluster (consider, e.g.: 1k 
> outstanding requests to 2 nodes is a very different situation than 1k 
> outstanding requests to 128 nodes).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to