Re: Querying locally before sending a distributed request

2015-01-20 Thread S G
I have submitted a patch for the ticket at
https://issues.apache.org/jira/browse/SOLR-6832

The patch creates an option *preferLocalShards* in solrconfig.xml and in
the query request params (giving more preference to the one in the query).

If this option is set,
HttpShardHandler.preferCurrentHostForDistributedReq() tries to find a local
URL and puts that URL as the first one in the list of URLs sent to
LBHttpSolrServer.
This ensures that the current host's cores will be given preference for
distributed queries.

Current host's URL is found by ResponseBuilder.findCurrentHostAddress() by
searching for current core's name in the list of shards.
Default value of the option is kept as 'false' to ensure normal behavior.

Before putting more effort in writing test-cases, I would like to have some
comments on this patch so that I can know that I am in the right direction
here.

Thanks
Sachin


On Wed, Dec 10, 2014 at 4:30 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 12/9/2014 10:55 PM, S G wrote:
  For a distributed query, the request is always sent to all the shards
  even if the originating SolrCore (handling the original distributed
  query) is a replica of one of the shards.
  If the original Solr-Core can check itself before sending http
  requests for any shard, we can probably save some network hopping and
  gain some performance.

 I have to agree with the other replies you've gotten.

 Consider a SolrCloud that is handling 5000 requests per second with a
 replicationFactor of 20 or 30.  This could be one shard or multiple
 shards.  Currently, those requests will be load balanced to the entire
 cluster.  If this option is implemented, suddenly EVERY request will
 have at least one part handled locally ... and unless the index is very
 tiny or 99 percent of the queries hit a Solr cache, one index core
 simply won't be able to handle 5000 queries per second.  Getting a
 single machine capable of handling that load MIGHT be possible, but it
 would likely be *VERY* expensive.

 This would be great as an *OPTION* that can be enabled when the index
 composition and query patterns dictate it will be beneficial ... but it
 definitely should not be default behavior.

 Thanks,
 Shawn


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Querying locally before sending a distributed request

2014-12-17 Thread S G
I have submitted a patch for this at
https://issues.apache.org/jira/browse/SOLR-6832
Would appreciate if someone can review it.

Thanks
SG

On Wed, Dec 10, 2014 at 4:30 PM, Shawn Heisey apa...@elyograg.org wrote:

 On 12/9/2014 10:55 PM, S G wrote:
  For a distributed query, the request is always sent to all the shards
  even if the originating SolrCore (handling the original distributed
  query) is a replica of one of the shards.
  If the original Solr-Core can check itself before sending http
  requests for any shard, we can probably save some network hopping and
  gain some performance.

 I have to agree with the other replies you've gotten.

 Consider a SolrCloud that is handling 5000 requests per second with a
 replicationFactor of 20 or 30.  This could be one shard or multiple
 shards.  Currently, those requests will be load balanced to the entire
 cluster.  If this option is implemented, suddenly EVERY request will
 have at least one part handled locally ... and unless the index is very
 tiny or 99 percent of the queries hit a Solr cache, one index core
 simply won't be able to handle 5000 queries per second.  Getting a
 single machine capable of handling that load MIGHT be possible, but it
 would likely be *VERY* expensive.

 This would be great as an *OPTION* that can be enabled when the index
 composition and query patterns dictate it will be beneficial ... but it
 definitely should not be default behavior.

 Thanks,
 Shawn


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Querying locally before sending a distributed request

2014-12-10 Thread Erick Erickson
Just skimming, but if I'm reading this right, your suggestion is
that queries be served locally rather than being forwarded to
another replica when possible.

So let's take the one-shard case with N replicas to make sure
I understand. In a one-shard case, no query really needs to
be forwarded, since any replica can fully get the results so
in this case no query would be forwarded.

If this is a fair summary, then consider the situation where the
outside world connects to a single server rather than to a
fronting load balancer. Then only one shard would be doing
any work

Or am I off in the weeds?

That aside, if I've gotten it wrong and you want to put
up a patch (or even just outline a better approach),
feel free to open a JIRA and attach a patch...

Best,
Erick

On Tue, Dec 9, 2014 at 11:55 PM, S G sg.online.em...@gmail.com wrote:
 Hello Solr Devs,

 I am a developer using Solr and wanted to have some opinion on a performance
 change request.

 Currently, I see that code flow for a query in SolrCloud is as follows:

 For distributed query:
 SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit()

 For non-distributed query:
 SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process()


 For a distributed query, the request is always sent to all the shards even
 if the originating SolrCore (handling the original distributed query) is a
 replica of one of the shards.
 If the original Solr-Core can check itself before sending http requests for
 any shard, we can probably save some network hopping and gain some
 performance.

 If this idea seems feasible, I can submit a JIRA ticket and work on it.
 I am planning to change SearchHandler.handleRequestBody() or
 HttpShardHandler.submit()

 Thanks
 SG


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Querying locally before sending a distributed request

2014-12-10 Thread Mike Drob
This is a cool idea, and there are two extremes to consider.

One-shard, N replicas, single connection point for consumers. This case
needs forwarding.

Many shards, 2 replicas each, random connection points for consumers. I
think this is the case that SG had in mind.

In order to meet both use cases, would it make sense to have a prefer
local reads configuration option where a Core can check itself if
instructed to?

Mike

On Wed, Dec 10, 2014 at 8:26 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Just skimming, but if I'm reading this right, your suggestion is
 that queries be served locally rather than being forwarded to
 another replica when possible.

 So let's take the one-shard case with N replicas to make sure
 I understand. In a one-shard case, no query really needs to
 be forwarded, since any replica can fully get the results so
 in this case no query would be forwarded.

 If this is a fair summary, then consider the situation where the
 outside world connects to a single server rather than to a
 fronting load balancer. Then only one shard would be doing
 any work

 Or am I off in the weeds?

 That aside, if I've gotten it wrong and you want to put
 up a patch (or even just outline a better approach),
 feel free to open a JIRA and attach a patch...

 Best,
 Erick

 On Tue, Dec 9, 2014 at 11:55 PM, S G sg.online.em...@gmail.com wrote:
  Hello Solr Devs,
 
  I am a developer using Solr and wanted to have some opinion on a
 performance
  change request.
 
  Currently, I see that code flow for a query in SolrCloud is as follows:
 
  For distributed query:
  SolrCore - SearchHandler.handleRequestBody() -
 HttpShardHandler.submit()
 
  For non-distributed query:
  SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process()
 
 
  For a distributed query, the request is always sent to all the shards
 even
  if the originating SolrCore (handling the original distributed query) is
 a
  replica of one of the shards.
  If the original Solr-Core can check itself before sending http requests
 for
  any shard, we can probably save some network hopping and gain some
  performance.
 
  If this idea seems feasible, I can submit a JIRA ticket and work on it.
  I am planning to change SearchHandler.handleRequestBody() or
  HttpShardHandler.submit()
 
  Thanks
  SG
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Querying locally before sending a distributed request

2014-12-10 Thread Steve Davids
bq. In a one-shard case, no query really needs to be forwarded, since any
replica can fully get the results so in this case no query would be
forwarded.

You can pass the request param distrib=false to not distribute the request
in that particular case at which point it will only gather results from
that particular host.

As for the SolrCloud example with n-shards  1 your overall search request
time is limited to the slowest shard's response time. So, you would
potentially be saving one hop, but you are still making n-1 other hops to
gather all of the other shard's results thus making it a moot point since
you will be waiting on the other shards to respond before you can return
the aggregated result list. You will then be on the hook to setup the load
balancing across replicas of that one particular host you have chosen to
query as Erick said which could have some gotchyas for people not expecting
that behavior.

-Steve

On Wed, Dec 10, 2014 at 9:26 AM, Erick Erickson erickerick...@gmail.com
wrote:

 Just skimming, but if I'm reading this right, your suggestion is
 that queries be served locally rather than being forwarded to
 another replica when possible.

 So let's take the one-shard case with N replicas to make sure
 I understand. In a one-shard case, no query really needs to
 be forwarded, since any replica can fully get the results so
 in this case no query would be forwarded.

 If this is a fair summary, then consider the situation where the
 outside world connects to a single server rather than to a
 fronting load balancer. Then only one shard would be doing
 any work

 Or am I off in the weeds?

 That aside, if I've gotten it wrong and you want to put
 up a patch (or even just outline a better approach),
 feel free to open a JIRA and attach a patch...

 Best,
 Erick

 On Tue, Dec 9, 2014 at 11:55 PM, S G sg.online.em...@gmail.com wrote:
  Hello Solr Devs,
 
  I am a developer using Solr and wanted to have some opinion on a
 performance
  change request.
 
  Currently, I see that code flow for a query in SolrCloud is as follows:
 
  For distributed query:
  SolrCore - SearchHandler.handleRequestBody() -
 HttpShardHandler.submit()
 
  For non-distributed query:
  SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process()
 
 
  For a distributed query, the request is always sent to all the shards
 even
  if the originating SolrCore (handling the original distributed query) is
 a
  replica of one of the shards.
  If the original Solr-Core can check itself before sending http requests
 for
  any shard, we can probably save some network hopping and gain some
  performance.
 
  If this idea seems feasible, I can submit a JIRA ticket and work on it.
  I am planning to change SearchHandler.handleRequestBody() or
  HttpShardHandler.submit()
 
  Thanks
  SG
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Querying locally before sending a distributed request

2014-12-10 Thread S G
I have opened https://issues.apache.org/jira/browse/SOLR-6832 to track this.

The performance gain increases if coresPerMachine is  1 and a single JVM
has cores from 'k' shards.

We can also look into giving more preference to machines with same IP
address as current machine (when multiple tomcats are running on same
machine).


On Wed, Dec 10, 2014 at 7:14 AM, Steve Davids sdav...@gmail.com wrote:

 bq. In a one-shard case, no query really needs to be forwarded, since any
 replica can fully get the results so in this case no query would be
 forwarded.

 You can pass the request param distrib=false to not distribute the request
 in that particular case at which point it will only gather results from
 that particular host.

 As for the SolrCloud example with n-shards  1 your overall search request
 time is limited to the slowest shard's response time. So, you would
 potentially be saving one hop, but you are still making n-1 other hops to
 gather all of the other shard's results thus making it a moot point since
 you will be waiting on the other shards to respond before you can return
 the aggregated result list. You will then be on the hook to setup the load
 balancing across replicas of that one particular host you have chosen to
 query as Erick said which could have some gotchyas for people not expecting
 that behavior.

 -Steve

 On Wed, Dec 10, 2014 at 9:26 AM, Erick Erickson erickerick...@gmail.com
 wrote:

 Just skimming, but if I'm reading this right, your suggestion is
 that queries be served locally rather than being forwarded to
 another replica when possible.

 So let's take the one-shard case with N replicas to make sure
 I understand. In a one-shard case, no query really needs to
 be forwarded, since any replica can fully get the results so
 in this case no query would be forwarded.

 If this is a fair summary, then consider the situation where the
 outside world connects to a single server rather than to a
 fronting load balancer. Then only one shard would be doing
 any work

 Or am I off in the weeds?

 That aside, if I've gotten it wrong and you want to put
 up a patch (or even just outline a better approach),
 feel free to open a JIRA and attach a patch...

 Best,
 Erick

 On Tue, Dec 9, 2014 at 11:55 PM, S G sg.online.em...@gmail.com wrote:
  Hello Solr Devs,
 
  I am a developer using Solr and wanted to have some opinion on a
 performance
  change request.
 
  Currently, I see that code flow for a query in SolrCloud is as follows:
 
  For distributed query:
  SolrCore - SearchHandler.handleRequestBody() -
 HttpShardHandler.submit()
 
  For non-distributed query:
  SolrCore - SearchHandler.handleRequestBody() -
 QueryComponent.process()
 
 
  For a distributed query, the request is always sent to all the shards
 even
  if the originating SolrCore (handling the original distributed query)
 is a
  replica of one of the shards.
  If the original Solr-Core can check itself before sending http requests
 for
  any shard, we can probably save some network hopping and gain some
  performance.
 
  If this idea seems feasible, I can submit a JIRA ticket and work on it.
  I am planning to change SearchHandler.handleRequestBody() or
  HttpShardHandler.submit()
 
  Thanks
  SG
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





Re: Querying locally before sending a distributed request

2014-12-10 Thread Shawn Heisey
On 12/9/2014 10:55 PM, S G wrote:
 For a distributed query, the request is always sent to all the shards
 even if the originating SolrCore (handling the original distributed
 query) is a replica of one of the shards.
 If the original Solr-Core can check itself before sending http
 requests for any shard, we can probably save some network hopping and
 gain some performance.

I have to agree with the other replies you've gotten.

Consider a SolrCloud that is handling 5000 requests per second with a
replicationFactor of 20 or 30.  This could be one shard or multiple
shards.  Currently, those requests will be load balanced to the entire
cluster.  If this option is implemented, suddenly EVERY request will
have at least one part handled locally ... and unless the index is very
tiny or 99 percent of the queries hit a Solr cache, one index core
simply won't be able to handle 5000 queries per second.  Getting a
single machine capable of handling that load MIGHT be possible, but it
would likely be *VERY* expensive.

This would be great as an *OPTION* that can be enabled when the index
composition and query patterns dictate it will be beneficial ... but it
definitely should not be default behavior.

Thanks,
Shawn


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Querying locally before sending a distributed request

2014-12-09 Thread S G
Hello Solr Devs,

I am a developer using Solr and wanted to have some opinion on a
performance change request.

Currently, I see that code flow for a query in SolrCloud is as follows:

For distributed query:
SolrCore - SearchHandler.handleRequestBody() - HttpShardHandler.submit()

For non-distributed query:
SolrCore - SearchHandler.handleRequestBody() - QueryComponent.process()


For a distributed query, the request is always sent to all the shards even
if the originating SolrCore (handling the original distributed query) is a
replica of one of the shards.
If the original Solr-Core can check itself before sending http requests for
any shard, we can probably save some network hopping and gain some
performance.

If this idea seems feasible, I can submit a JIRA ticket and work on it.
I am planning to change SearchHandler.handleRequestBody() or
HttpShardHandler.submit()

Thanks
SG