Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Tom Mortimer
Hi all,

After reading 
http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ , 
I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed in 
Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded and 
2-shard configuration (the latter set up with SolrCloud following the 
http://wiki.apache.org/solr/SolrCloud example). I wrote a simple python script 
to randomly throw queries from a hand-compiled list at Solr. The only extra I 
had turned on was facets (on document category).

To my surprise, the performance of the 2-shard configuration is almost exactly 
half that of the unsharded index - 

unsharded
4983912891 results in 24920 searches; 0 errors
70.02 mean qps
0.35s mean query time, 2.25s max, 0.00s min
90%   of qtimes = 0.83s
99%   of qtimes = 1.42s
99.9% of qtimes = 1.68s

2-shard
4990351660 results in 24501 searches; 0 errors
34.07 mean qps
0.66s mean query time, 694.20s max, 0.01s min
90%   of qtimes = 1.19s
99%   of qtimes = 2.12s
99.9% of qtimes = 2.95s

All caches were set to 4096 items, and performance looks ok in both cases (hit 
ratios close to 1.0, 0 evictions). I gave the single VM -Xmx1G and each shard 
VM -Xmx500M.

I must be doing something stupid - surely this result is unexpected? Does 
anybody have any thoughts where it might be going wrong?

cheers,
Tom



Re: Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Tom Mortimer
Before anyone asks, these results were obtained warm.

On 20 Sep 2012, at 14:39, Tom Mortimer tom.m.f...@gmail.com wrote:

 Hi all,
 
 After reading 
 http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ 
 , I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed 
 in Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded 
 and 2-shard configuration (the latter set up with SolrCloud following the 
 http://wiki.apache.org/solr/SolrCloud example). I wrote a simple python 
 script to randomly throw queries from a hand-compiled list at Solr. The only 
 extra I had turned on was facets (on document category).
 
 To my surprise, the performance of the 2-shard configuration is almost 
 exactly half that of the unsharded index - 
 
 unsharded
 4983912891 results in 24920 searches; 0 errors
 70.02 mean qps
 0.35s mean query time, 2.25s max, 0.00s min
 90%   of qtimes = 0.83s
 99%   of qtimes = 1.42s
 99.9% of qtimes = 1.68s
 
 2-shard
 4990351660 results in 24501 searches; 0 errors
 34.07 mean qps
 0.66s mean query time, 694.20s max, 0.01s min
 90%   of qtimes = 1.19s
 99%   of qtimes = 2.12s
 99.9% of qtimes = 2.95s
 
 All caches were set to 4096 items, and performance looks ok in both cases 
 (hit ratios close to 1.0, 0 evictions). I gave the single VM -Xmx1G and each 
 shard VM -Xmx500M.
 
 I must be doing something stupid - surely this result is unexpected? Does 
 anybody have any thoughts where it might be going wrong?
 
 cheers,
 Tom
 



Re: Solr 4.0 - disappointing results sharding on 1 machine

2012-09-20 Thread Yonik Seeley
Depends on where the bottlenecks are I guess.

On a single system, increasing shards decreases throughput  (this
isn't specific to Solr).  The increased parallelism *can* decrease
latency to the degree that the parts that were parallelized outweigh
the overhead.

Going from one shard to two shards is also the most extreme case since
the unsharded case as no distributed overhead whatsoever.

What's the average CPU load during your tests?
How are you testing (i.e. how many requests are in progress at the same time?)
In your unsharded case, what's taking up the bulk of the time?

-Yonik
http://lucidworks.com


On Thu, Sep 20, 2012 at 9:39 AM, Tom Mortimer tom.m.f...@gmail.com wrote:
 Hi all,

 After reading 
 http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ 
 , I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed 
 in Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded 
 and 2-shard configuration (the latter set up with SolrCloud following the 
 http://wiki.apache.org/solr/SolrCloud example). I wrote a simple python 
 script to randomly throw queries from a hand-compiled list at Solr. The only 
 extra I had turned on was facets (on document category).

 To my surprise, the performance of the 2-shard configuration is almost 
 exactly half that of the unsharded index -

 unsharded
 4983912891 results in 24920 searches; 0 errors
 70.02 mean qps
 0.35s mean query time, 2.25s max, 0.00s min
 90%   of qtimes = 0.83s
 99%   of qtimes = 1.42s
 99.9% of qtimes = 1.68s

 2-shard
 4990351660 results in 24501 searches; 0 errors
 34.07 mean qps
 0.66s mean query time, 694.20s max, 0.01s min
 90%   of qtimes = 1.19s
 99%   of qtimes = 2.12s
 99.9% of qtimes = 2.95s

 All caches were set to 4096 items, and performance looks ok in both cases 
 (hit ratios close to 1.0, 0 evictions). I gave the single VM -Xmx1G and each 
 shard VM -Xmx500M.

 I must be doing something stupid - surely this result is unexpected? Does 
 anybody have any thoughts where it might be going wrong?

 cheers,
 Tom