Solr 4.0 - disappointing results sharding on 1 machine
Hi all, After reading http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ , I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed in Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded and 2-shard configuration (the latter set up with SolrCloud following the http://wiki.apache.org/solr/SolrCloud example). I wrote a simple python script to randomly throw queries from a hand-compiled list at Solr. The only extra I had turned on was facets (on document category). To my surprise, the performance of the 2-shard configuration is almost exactly half that of the unsharded index - unsharded 4983912891 results in 24920 searches; 0 errors 70.02 mean qps 0.35s mean query time, 2.25s max, 0.00s min 90% of qtimes = 0.83s 99% of qtimes = 1.42s 99.9% of qtimes = 1.68s 2-shard 4990351660 results in 24501 searches; 0 errors 34.07 mean qps 0.66s mean query time, 694.20s max, 0.01s min 90% of qtimes = 1.19s 99% of qtimes = 2.12s 99.9% of qtimes = 2.95s All caches were set to 4096 items, and performance looks ok in both cases (hit ratios close to 1.0, 0 evictions). I gave the single VM -Xmx1G and each shard VM -Xmx500M. I must be doing something stupid - surely this result is unexpected? Does anybody have any thoughts where it might be going wrong? cheers, Tom
Re: Solr 4.0 - disappointing results sharding on 1 machine
Before anyone asks, these results were obtained warm. On 20 Sep 2012, at 14:39, Tom Mortimer tom.m.f...@gmail.com wrote: Hi all, After reading http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ , I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed in Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded and 2-shard configuration (the latter set up with SolrCloud following the http://wiki.apache.org/solr/SolrCloud example). I wrote a simple python script to randomly throw queries from a hand-compiled list at Solr. The only extra I had turned on was facets (on document category). To my surprise, the performance of the 2-shard configuration is almost exactly half that of the unsharded index - unsharded 4983912891 results in 24920 searches; 0 errors 70.02 mean qps 0.35s mean query time, 2.25s max, 0.00s min 90% of qtimes = 0.83s 99% of qtimes = 1.42s 99.9% of qtimes = 1.68s 2-shard 4990351660 results in 24501 searches; 0 errors 34.07 mean qps 0.66s mean query time, 694.20s max, 0.01s min 90% of qtimes = 1.19s 99% of qtimes = 2.12s 99.9% of qtimes = 2.95s All caches were set to 4096 items, and performance looks ok in both cases (hit ratios close to 1.0, 0 evictions). I gave the single VM -Xmx1G and each shard VM -Xmx500M. I must be doing something stupid - surely this result is unexpected? Does anybody have any thoughts where it might be going wrong? cheers, Tom
Re: Solr 4.0 - disappointing results sharding on 1 machine
Depends on where the bottlenecks are I guess. On a single system, increasing shards decreases throughput (this isn't specific to Solr). The increased parallelism *can* decrease latency to the degree that the parts that were parallelized outweigh the overhead. Going from one shard to two shards is also the most extreme case since the unsharded case as no distributed overhead whatsoever. What's the average CPU load during your tests? How are you testing (i.e. how many requests are in progress at the same time?) In your unsharded case, what's taking up the bulk of the time? -Yonik http://lucidworks.com On Thu, Sep 20, 2012 at 9:39 AM, Tom Mortimer tom.m.f...@gmail.com wrote: Hi all, After reading http://carsabi.com/car-news/2012/03/23/optimizing-solr-7x-your-search-speed/ , I thought I'd do my own experiments. I used 2M docs from wikipedia, indexed in Solr 4.0 Beta on a standard EC2 large instance. I compared an unsharded and 2-shard configuration (the latter set up with SolrCloud following the http://wiki.apache.org/solr/SolrCloud example). I wrote a simple python script to randomly throw queries from a hand-compiled list at Solr. The only extra I had turned on was facets (on document category). To my surprise, the performance of the 2-shard configuration is almost exactly half that of the unsharded index - unsharded 4983912891 results in 24920 searches; 0 errors 70.02 mean qps 0.35s mean query time, 2.25s max, 0.00s min 90% of qtimes = 0.83s 99% of qtimes = 1.42s 99.9% of qtimes = 1.68s 2-shard 4990351660 results in 24501 searches; 0 errors 34.07 mean qps 0.66s mean query time, 694.20s max, 0.01s min 90% of qtimes = 1.19s 99% of qtimes = 2.12s 99.9% of qtimes = 2.95s All caches were set to 4096 items, and performance looks ok in both cases (hit ratios close to 1.0, 0 evictions). I gave the single VM -Xmx1G and each shard VM -Xmx500M. I must be doing something stupid - surely this result is unexpected? Does anybody have any thoughts where it might be going wrong? cheers, Tom