Re: Solr Cloud Default Document Routing

2014-09-25 Thread Erick Erickson
Well, you've picked the absolute worst case for comparison. The
increase to double digits is a constant overhead. IOW, let's
say your query went from 5ms to 20 ms. That 15 ms is pretty much
the additional overhead no matter what the query. This particular
query just happens to be very fast in the first place.

As far as queries going out to all the shards.. Well, they have to.
The query processing cannot know ahead of time (except in this
_very_ special case) what shards will generate hits. So the request
is sent out to one replica in each shard, which responds with its
top N. The originating node then combines the sub-queries to get
the IDs of the final top N, then sends a request out to each shard
hosting one of those top N for the data associated with the
document.

If you really need super-efficiency here, you could probably
look at SolrCloudServer to get an idea of how to translate from
ID to shard and just do direct requests with distrib=false.

Best,
Erick


On Wed, Sep 24, 2014 at 5:44 PM, Susmit Shukla shukla.sus...@gmail.com
wrote:

 Hi,

 I'm building out a multi shard solr collection as the index size is likely
 to grow fast.
 I was testing out the setup with 2 shards on 2 nodes with test data.
 Indexed few documents with id as the unique key.
 collection create command -
 /solr/admin/collections?action=CREATEname=multishardnumShards=2

 used this command to upload - curl
 http://server/solr/multishard/update/json?commitWithin=2000 --data-binary
 @data.json -H 'Content-type:application/json'

 data.json -
 [
   {
 id: 100161200
   }
   {
 id: 100161384
   }
 ]

 when I query on one of the node with with an id constraint, I see the query
 executed on both shards which looks inefficient - Qtime increased to double
 digits. I guess solr would know based on id which shard data went to.

 I have a few questions around this as I could not find pertinent
 information on user lists or documentation.
 - query is hitting all shards and replicas - if I have 3 shards and 5
 replicas , how would the performance be impacted since for the very simple
 case it increased to double digits?
 - Could id lookup queries just go to one shard automatically?


 /solr/multishard/select?q=id%3A100161200wt=jsonindent=truedebugQuery=true

 QTime:13,

   debug:{
 track:{
   rid:-multishard_shard1_replica1-1411605234897-171,
   EXECUTE_QUERY:[
 http://server1/solr/multishard_shard1_replica1/;,[
   QTime,1,
   ElapsedTime,4,
   RequestPurpose,GET_TOP_IDS,
   NumFound,1,
   Response,some resp],
 http://server2/solr/multishard_shard2_replica1/;,[
   QTime,1,
   ElapsedTime,6,
   RequestPurpose,GET_TOP_IDS,
   NumFound,0,
   Response,some]],
   GET_FIELDS:[
 http://server1/solr/multishard_shard1_replica1/;,[
   QTime,0,
   ElapsedTime,4,
   RequestPurpose,GET_FIELDS,GET_DEBUG,
   NumFound,1,


 Thanks,
 Susmit



Solr Cloud Default Document Routing

2014-09-24 Thread Susmit Shukla
Hi,

I'm building out a multi shard solr collection as the index size is likely
to grow fast.
I was testing out the setup with 2 shards on 2 nodes with test data.
Indexed few documents with id as the unique key.
collection create command -
/solr/admin/collections?action=CREATEname=multishardnumShards=2

used this command to upload - curl
http://server/solr/multishard/update/json?commitWithin=2000 --data-binary
@data.json -H 'Content-type:application/json'

data.json -
[
  {
id: 100161200
  }
  {
id: 100161384
  }
]

when I query on one of the node with with an id constraint, I see the query
executed on both shards which looks inefficient - Qtime increased to double
digits. I guess solr would know based on id which shard data went to.

I have a few questions around this as I could not find pertinent
information on user lists or documentation.
- query is hitting all shards and replicas - if I have 3 shards and 5
replicas , how would the performance be impacted since for the very simple
case it increased to double digits?
- Could id lookup queries just go to one shard automatically?

/solr/multishard/select?q=id%3A100161200wt=jsonindent=truedebugQuery=true

QTime:13,

  debug:{
track:{
  rid:-multishard_shard1_replica1-1411605234897-171,
  EXECUTE_QUERY:[
http://server1/solr/multishard_shard1_replica1/;,[
  QTime,1,
  ElapsedTime,4,
  RequestPurpose,GET_TOP_IDS,
  NumFound,1,
  Response,some resp],
http://server2/solr/multishard_shard2_replica1/;,[
  QTime,1,
  ElapsedTime,6,
  RequestPurpose,GET_TOP_IDS,
  NumFound,0,
  Response,some]],
  GET_FIELDS:[
http://server1/solr/multishard_shard1_replica1/;,[
  QTime,0,
  ElapsedTime,4,
  RequestPurpose,GET_FIELDS,GET_DEBUG,
  NumFound,1,


Thanks,
Susmit