Re: Performance testing on SOLR cloud
Hi Aswath, It is not common to test only QPS unless it is static index most of the time. Usually you have to test and tune worst case scenario - max expected indexing rate + queries. You can get more QPS by reducing query latency or by increasing number of replicas. You manage latency by tuning Solr/JVM/queries and/or by sharding index. You first tune index without replication and when sure it is best single index can provide, you introduce replicas to achieve required throughput. Hard part is tuning Solr. You can do it without specialized tools, but tools help a lot. One such tool is Sematext's SPM - https://sematext.com/spm/index.html where you can see all necessary Solr/JVM/OS metrics needed to tune Solr. It also provides QPS graph. With index your size, unless documents are really big, you can start without sharding. After tuning, if not satisfied with query latency, you can try splitting to two shards. Thanks, Emir -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ On 17.11.2015 23:45, Aswath Srinivasan (TMS) wrote: Hi fellow developers, Please share your experience, on how you did performance testing on SOLR? What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM and index a total of 2.2 million. Yet to decide how many shards and replicas to have (Any hint on this is welcome too, basically 'only' performance testing, so suggest the number of shards and replicas if you can). Ultimately, I'm trying to find the QPS that this SOLR cloud set up can handle. To summarize, 1. Find the QPS that my solr cloud set up can support 2. Using 5.3.1 version with external zookeeper 3. 3 Linux servers with 16 GB RAM and index a total of 2.2 million documents 4. Yet to decide number of shards and replicas 5. Not using any custom search application (performance testing for SOLR and not for Search portal) Thank you
Re: Performance testing on SOLR cloud
to add to Ericks point: It's also highly dependent on the types of queries you expect (sorting, faceting, fq, q, size of documents) and how many concurrent updates you expect. If most queries are going to be similar and you are not going to be updating very often, you can expect most of your index to be loaded into page cache and lots of your queries to loaded from doc or query cache (especially if you can optimize your fq to be similar vs using q and which introduces scoring overhead). Adding more replicas will help distribute the load. Adding shards will allow you to parallelize things but add some memory and latency overhead because results still need to be merged. If your shards are across multiple machine you now introduce network latency. I've seen good success with using many shards in the same jvm but this is with collections with billions of documents. On Tue, Nov 17, 2015 at 9:07 PM Erick Erickson wrote: > I wouldn't bother to shard either. YMMV of course, but 2.2M documents > is actually a pretty small number unless the docs themselves are huge. > Sharding introduces inevitable overhead, so it's usually the last > thing you resort to. > > As far as the number of replicas is concerned, that's strictly a > function of what QPS you need. Let's say you do not shard and have a > query rate of 20 queries-per-second. If you need to support 100 QPS, > just add 4 more replicas, this can be done any time. > > Best, > Erick > > On Tue, Nov 17, 2015 at 3:38 PM, Markus Jelsma > wrote: > > Hi - we use the Siege load testing program. It can take a seed list of > URL's, taken from actual user input, and can put load in parallel. It won't > reuse common queries unless you prepare your seed list appropriately. If > your setup achieves the goal your client anticipates, then you are fine. > Siege is not a good tool to test extreme QPS due to obvious single machine > and network limitations. > > > > Assuming your JVM heap settings and Solr cache settings are optimal, and > your only question is how many shards, then increase the number of shards. > Oversharding can be beneficial because more threads process less data. > Every single core search is single threaded, so oversharding on the same > hardware makes sense, and it seems to pay off. > > > > Make sure you run multiple long stress tests and restart JVM's in > between because a) query times and load tend to regress to the mean and b) > because HotSpot needs to 'warm up' so short tests make less sense. > > > > M. > > > > > > > > -Original message- > >> From:Aswath Srinivasan (TMS) > >> Sent: Tuesday 17th November 2015 23:46 > >> To: solr-user@lucene.apache.org > >> Subject: Performance testing on SOLR cloud > >> > >> Hi fellow developers, > >> > >> Please share your experience, on how you did performance testing on > SOLR? What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 > GB RAM and index a total of 2.2 million. Yet to decide how many shards and > replicas to have (Any hint on this is welcome too, basically 'only' > performance testing, so suggest the number of shards and replicas if you > can). Ultimately, I'm trying to find the QPS that this SOLR cloud set up > can handle. > >> > >> To summarize, > >> > >> 1. Find the QPS that my solr cloud set up can support > >> > >> 2. Using 5.3.1 version with external zookeeper > >> > >> 3. 3 Linux servers with 16 GB RAM and index a total of 2.2 million > documents > >> > >> 4. Yet to decide number of shards and replicas > >> > >> 5. Not using any custom search application (performance testing for > SOLR and not for Search portal) > >> > >> Thank you > >> >
Re: Performance testing on SOLR cloud
I wouldn't bother to shard either. YMMV of course, but 2.2M documents is actually a pretty small number unless the docs themselves are huge. Sharding introduces inevitable overhead, so it's usually the last thing you resort to. As far as the number of replicas is concerned, that's strictly a function of what QPS you need. Let's say you do not shard and have a query rate of 20 queries-per-second. If you need to support 100 QPS, just add 4 more replicas, this can be done any time. Best, Erick On Tue, Nov 17, 2015 at 3:38 PM, Markus Jelsma wrote: > Hi - we use the Siege load testing program. It can take a seed list of URL's, > taken from actual user input, and can put load in parallel. It won't reuse > common queries unless you prepare your seed list appropriately. If your setup > achieves the goal your client anticipates, then you are fine. Siege is not a > good tool to test extreme QPS due to obvious single machine and network > limitations. > > Assuming your JVM heap settings and Solr cache settings are optimal, and your > only question is how many shards, then increase the number of shards. > Oversharding can be beneficial because more threads process less data. Every > single core search is single threaded, so oversharding on the same hardware > makes sense, and it seems to pay off. > > Make sure you run multiple long stress tests and restart JVM's in between > because a) query times and load tend to regress to the mean and b) because > HotSpot needs to 'warm up' so short tests make less sense. > > M. > > > > -Original message- >> From:Aswath Srinivasan (TMS) >> Sent: Tuesday 17th November 2015 23:46 >> To: solr-user@lucene.apache.org >> Subject: Performance testing on SOLR cloud >> >> Hi fellow developers, >> >> Please share your experience, on how you did performance testing on SOLR? >> What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM >> and index a total of 2.2 million. Yet to decide how many shards and replicas >> to have (Any hint on this is welcome too, basically 'only' performance >> testing, so suggest the number of shards and replicas if you can). >> Ultimately, I'm trying to find the QPS that this SOLR cloud set up can >> handle. >> >> To summarize, >> >> 1. Find the QPS that my solr cloud set up can support >> >> 2. Using 5.3.1 version with external zookeeper >> >> 3. 3 Linux servers with 16 GB RAM and index a total of 2.2 million >> documents >> >> 4. Yet to decide number of shards and replicas >> >> 5. Not using any custom search application (performance testing for SOLR >> and not for Search portal) >> >> Thank you >>
RE: Performance testing on SOLR cloud
Hi - we use the Siege load testing program. It can take a seed list of URL's, taken from actual user input, and can put load in parallel. It won't reuse common queries unless you prepare your seed list appropriately. If your setup achieves the goal your client anticipates, then you are fine. Siege is not a good tool to test extreme QPS due to obvious single machine and network limitations. Assuming your JVM heap settings and Solr cache settings are optimal, and your only question is how many shards, then increase the number of shards. Oversharding can be beneficial because more threads process less data. Every single core search is single threaded, so oversharding on the same hardware makes sense, and it seems to pay off. Make sure you run multiple long stress tests and restart JVM's in between because a) query times and load tend to regress to the mean and b) because HotSpot needs to 'warm up' so short tests make less sense. M. -Original message- > From:Aswath Srinivasan (TMS) > Sent: Tuesday 17th November 2015 23:46 > To: solr-user@lucene.apache.org > Subject: Performance testing on SOLR cloud > > Hi fellow developers, > > Please share your experience, on how you did performance testing on SOLR? > What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM > and index a total of 2.2 million. Yet to decide how many shards and replicas > to have (Any hint on this is welcome too, basically 'only' performance > testing, so suggest the number of shards and replicas if you can). > Ultimately, I'm trying to find the QPS that this SOLR cloud set up can handle. > > To summarize, > > 1. Find the QPS that my solr cloud set up can support > > 2. Using 5.3.1 version with external zookeeper > > 3. 3 Linux servers with 16 GB RAM and index a total of 2.2 million documents > > 4. Yet to decide number of shards and replicas > > 5. Not using any custom search application (performance testing for SOLR > and not for Search portal) > > Thank you >
Performance testing on SOLR cloud
Hi fellow developers, Please share your experience, on how you did performance testing on SOLR? What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM and index a total of 2.2 million. Yet to decide how many shards and replicas to have (Any hint on this is welcome too, basically 'only' performance testing, so suggest the number of shards and replicas if you can). Ultimately, I'm trying to find the QPS that this SOLR cloud set up can handle. To summarize, 1. Find the QPS that my solr cloud set up can support 2. Using 5.3.1 version with external zookeeper 3. 3 Linux servers with 16 GB RAM and index a total of 2.2 million documents 4. Yet to decide number of shards and replicas 5. Not using any custom search application (performance testing for SOLR and not for Search portal) Thank you