[
https://issues.apache.org/jira/browse/SOLR-17419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17914297#comment-17914297
]
David Smiley commented on SOLR-17419:
-------------------------------------
I found out about this cool optimization by reading our release notes, LOL.
Nice work!
Question: Did you consider enhancing the default handler so that either
approach can be taken without the user explicitly configuring this thing?
Ideally Solr would use this if the shard count is above some threshold.
> Improve HttpShardHandler performance in many-shard collections
> --------------------------------------------------------------
>
> Key: SOLR-17419
> URL: https://issues.apache.org/jira/browse/SOLR-17419
> Project: Solr
> Issue Type: Bug
> Components: SolrCloud
> Affects Versions: 9.0, 9.6.1
> Reporter: Jason Gerlowski
> Assignee: Jason Gerlowski
> Priority: Major
> Labels: pull-request-available
> Fix For: main (10.0), 9.8
>
> Attachments: shardhandler-perf-graph.png
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> In Solr 8, HttpShardHandler sends shard-requests by submitting Callables to
> an ExecutorService. As a result, both the "request-sending" and
> "response-awaiting" happened asynchronous to the original request-thread.
> {code:java}
> @Override
> public void submit(final ShardRequest sreq, final String shard, final
> ModifiableSolrParams params) {
> ShardRequestor shardRequestor = new ShardRequestor(sreq, shard, params,
> this); // Callable
> try {
> shardRequestor.init();
> pending.add(completionService.submit(shardRequestor));
> } finally {
> shardRequestor.end();
> }
> }
> {code}
> However, in Solr 9.x HttpShardHandler ditched the
> ExecutorService/per-request-thread approach in favor of [sending all requests
> serially using
> "SolrClient.requestAsync"|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/component/HttpShardHandler.java#L163].
> SOLR-14354, which made this change, did this in an effort to avoid
> unnecessary thread and CPU context-switching. As Dat described in SOLR-14354:
> {quote}after sending a request that thread basically do nothing just waiting
> for response from other side. That thread will be swapped out and CPU will
> try to handle another thread (this is called context switch, CPU will save
> the context of the current thread and switch to another one). When some data
> (not all) come back, that thread will be called to parsing these data, then
> it will wait until more data come back. So there will be lots of context
> switching in CPU. That is quite inefficient
> {quote}
> This approach comes with a downside though - all the shard requests are sent
> serially. If sending each request takes ~1ms, then a user is unlikely to
> notice this in their collection with 5 or 10 shards. But the cost here
> scales linearly, so in *a collection with 50 shards - this approach would
> bake a ~50ms delay into the critical path of every single query!*
> This issue is intended to reevaluate whether there's a better way to balance
> these concerns. Ideally we can come up with an approach that improves all
> scenarios. Lacking that, maybe Solr could choose between one of several
> approaches semi-intelligently based on the number of shards or other factors?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]