Efficient Sharding with date sorted queries

2009-06-12 Thread Garafola Timothy
I have a solr index which is going to grow 3x in the near future.  I'm
considering using distributed search and was contemplating what would
be the best approach to splitting the index.  Since most of the
searches performed on the index are sorted by date descending, I'm
considering splitting the index based on the created date of the
documents.

>From Yonik Seeley's blog post,
http://yonik.wordpress.com/2008/02/27/distributed-search-for-solr/,
I've read that there are two phases to sharding.  The first phase
collects matching ids and documents across the shards.  Then the
second phase collects the stored fields for the documents.  I'm
assuming that this second phase's execution is limited by the number
of rows requested and the number of results.

So let's say I have 2 shards.  The first shard has docs with creation
dates of this year.  The Second shard contains documents from the
previous year.  I run a solr query requesting 10 rows sorted by date
and get 11 from the first shard and 3 from the second.  Will the
initial query only execute the first phase on the second shard?  If
so, that should result in more optimum performance, right?


Thanks,
-Tim


Re: Efficient Sharding with date sorted queries

2009-06-12 Thread Shalin Shekhar Mangar
On Fri, Jun 12, 2009 at 10:28 PM, Garafola Timothy wrote:

>
> So let's say I have 2 shards.  The first shard has docs with creation
> dates of this year.  The Second shard contains documents from the
> previous year.  I run a solr query requesting 10 rows sorted by date
> and get 11 from the first shard and 3 from the second.


No, you cannot request specific number of results from a shard. That is
something that Solr will manage itself. It requests start+rows number of
documents from each shard to find the rows number of documents to be
returned. If you really want to get a specific number of results from a
shard, make a query to that shard alone.

-- 
Regards,
Shalin Shekhar Mangar.