Jason Gerlowski created SOLR-17419:
--------------------------------------

             Summary: Improve HttpShardHandler performance in many-shard 
collections
                 Key: SOLR-17419
                 URL: https://issues.apache.org/jira/browse/SOLR-17419
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: SolrCloud
    Affects Versions: 9.6.1, 9.0
            Reporter: Jason Gerlowski


In Solr 8, HttpShardHandler sends shard-requests by submitting Callables to an 
ExecutorService. As a result, both the "request-sending" and 
"response-awaiting" happened asynchronous to the original request-thread.
{code:java}
  @Override
  public void submit(final ShardRequest sreq, final String shard, final 
ModifiableSolrParams params) {
    ShardRequestor shardRequestor = new ShardRequestor(sreq, shard, params, 
this); // Callable
    try {
      shardRequestor.init();
      pending.add(completionService.submit(shardRequestor));
    } finally {
      shardRequestor.end();
    }   
  }
{code}
However, in Solr 9.x HttpShardHandler ditched the 
ExecutorService/per-request-thread approach in favor of [sending all requests 
serially using 
"SolrClient.requestAsync"|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java#L163].
 SOLR-14354, which made this change, did this in an effort to avoid unnecessary 
thread and CPU context-switching. As Dat described in SOLR-14354:
{quote}after sending a request that thread basically do nothing just waiting 
for response from other side. That thread will be swapped out and CPU will try 
to handle another thread (this is called context switch, CPU will save the 
context of the current thread and switch to another one). When some data (not 
all) come back, that thread will be called to parsing these data, then it will 
wait until more data come back. So there will be lots of context switching in 
CPU. That is quite inefficient
{quote}
This approach comes with a downside though - all the shard requests are sent 
serially. If sending each request takes ~1ms, then a user is unlikely to notice 
this in their collection with 5 or 10 shards.  But the cost here scales 
linearly, so in *a collection with 50 shards - this approach would bake a ~50ms 
delay into the critical path of every single query!*

This issue is intended to reevaluate whether there's a better way to balance 
these concerns. Ideally we can come up with an approach that improves all 
scenarios. Lacking that, maybe Solr could choose between one of several 
approaches semi-intelligently based on the number of shards or other factors?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org
For additional commands, e-mail: issues-h...@solr.apache.org

Reply via email to