[ 
https://issues.apache.org/jira/browse/SOLR-6832?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14244844#comment-14244844
 ] 

Ayon Sinha commented on SOLR-6832:
----------------------------------

Hi [~elyograg], I work with [~sachingoyal]. The background of this patch is 
that, we have a cluster of 14 machines actually serving upwards of 5000 qps, 
and when one machine goes into a multi-second GC pause, it easily brings down 
the entire cluster. I know this is not the sole cause of the distributed 
deadlock and we definitely fixed other things like (gc pauses, thread counts 
etc) to reduce the likelihood of this problem.

In the scenario that you mention, the load balancer outside SolrCloud is at 
fault and when that is the case we'd like it to take down only one replica 
rather than propagate the problem to other replicas.

So to be clear, when this Option is ON, the only thing you'll "lose" is extra 
load balancing among the shard-queries. And frankly when I have all the shards 
in the same node, I prefer to NOT go over the network as network is among the 
most unreliable and taxed resource in cloud environments. When we go over the 
network to another compute, I have no idea what is carrying me over there and 
how is that other node doing overall. 
We will post our results on the benefit of having this option as ON.

> Queries be served locally rather than being forwarded to another replica
> ------------------------------------------------------------------------
>
>                 Key: SOLR-6832
>                 URL: https://issues.apache.org/jira/browse/SOLR-6832
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>    Affects Versions: 4.10.2
>            Reporter: Sachin Goyal
>
> Currently, I see that code flow for a query in SolrCloud is as follows:
> For distributed query:
> SolrCore -> SearchHandler.handleRequestBody() -> HttpShardHandler.submit()
> For non-distributed query:
> SolrCore -> SearchHandler.handleRequestBody() -> QueryComponent.process()
> \\
> \\
> \\
> For a distributed query, the request is always sent to all the shards even if 
> the originating SolrCore (handling the original distributed query) is a 
> replica of one of the shards.
> If the original Solr-Core can check itself before sending http requests for 
> any shard, we can probably save some network hopping and gain some 
> performance.
> \\
> \\
> We can change SearchHandler.handleRequestBody() or HttpShardHandler.submit() 
> to fix this behavior (most likely the former and not the latter).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to