Re: Performance testing on SOLR cloud

2015-11-18 Thread Emir Arnautovic

Hi Aswath,
It is not common to test only QPS unless it is static index most of the 
time. Usually you have to test and tune worst case scenario - max 
expected indexing rate + queries. You can get more QPS by reducing query 
latency or by increasing number of replicas. You manage latency by 
tuning Solr/JVM/queries and/or by sharding index. You first tune index 
without replication and when sure it is best single index can provide, 
you introduce replicas to achieve required throughput.


Hard part is tuning Solr. You can do it without specialized tools, but 
tools help a lot. One such tool is Sematext's SPM - 
https://sematext.com/spm/index.html where you can see all necessary 
Solr/JVM/OS metrics needed to tune Solr. It also provides QPS graph.


With index your size, unless documents are really big, you can start 
without sharding. After tuning, if not satisfied with query latency, you 
can try splitting to two shards.


Thanks,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On 17.11.2015 23:45, Aswath Srinivasan (TMS) wrote:

Hi fellow developers,

Please share your experience, on how you did performance testing on SOLR? What 
I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM and index 
a total of 2.2 million. Yet to decide how many shards and replicas to have (Any 
hint on this is welcome too, basically 'only' performance testing, so suggest 
the number of shards and replicas if you can). Ultimately, I'm trying to find 
the QPS that this SOLR cloud set up can handle.

To summarize,

1.   Find the QPS that my solr cloud set up can support

2.   Using 5.3.1 version with external zookeeper

3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million documents

4.   Yet to decide number of shards and replicas

5.   Not using any custom search application (performance testing for SOLR and 
not for Search portal)

Thank you





Re: Performance testing on SOLR cloud

2015-11-17 Thread Keith L
to add to Ericks point:

It's also highly dependent on the types of queries you expect (sorting,
faceting, fq, q, size of documents) and how many concurrent updates you
expect. If most queries are going to be similar and you are not going to be
updating very often, you can expect most of your index to be loaded into
page cache and lots of your queries to loaded from doc or query cache
(especially if you can optimize your fq to be similar vs using q and which
introduces scoring overhead). Adding more replicas will help distribute the
load. Adding shards will allow you to parallelize things but add some
memory and latency overhead because results still need to be merged. If
your shards are across multiple machine you now introduce network latency.
I've seen good success with using many shards in the same jvm but this is
with collections with billions of documents.

On Tue, Nov 17, 2015 at 9:07 PM Erick Erickson 
wrote:

> I wouldn't bother to shard either. YMMV of course, but 2.2M documents
> is actually a pretty small number unless the docs themselves are huge.
> Sharding introduces inevitable overhead, so it's usually the last
> thing you resort to.
>
> As far as the number of replicas is concerned, that's strictly a
> function of what QPS you need. Let's say you do not shard and have a
> query rate of 20 queries-per-second. If you need to support 100 QPS,
> just add 4 more replicas, this can be done any time.
>
> Best,
> Erick
>
> On Tue, Nov 17, 2015 at 3:38 PM, Markus Jelsma
>  wrote:
> > Hi - we use the Siege load testing program. It can take a seed list of
> URL's, taken from actual user input, and can put load in parallel. It won't
> reuse common queries unless you prepare your seed list appropriately. If
> your setup achieves the goal your client anticipates, then you are fine.
> Siege is not a good tool to test extreme QPS due to obvious single machine
> and network limitations.
> >
> > Assuming your JVM heap settings and Solr cache settings are optimal, and
> your only question is how many shards, then increase the number of shards.
> Oversharding can be beneficial because more threads process less data.
> Every single core search is single threaded, so oversharding on the same
> hardware makes sense, and it seems to pay off.
> >
> > Make sure you run multiple long stress tests and restart JVM's in
> between because a) query times and load tend to regress to the mean and b)
> because HotSpot needs to 'warm up' so short tests make less sense.
> >
> > M.
> >
> >
> >
> > -Original message-
> >> From:Aswath Srinivasan (TMS) 
> >> Sent: Tuesday 17th November 2015 23:46
> >> To: solr-user@lucene.apache.org
> >> Subject: Performance testing on SOLR cloud
> >>
> >> Hi fellow developers,
> >>
> >> Please share your experience, on how you did performance testing on
> SOLR? What I'm trying to do is have SOLR cloud on 3 Linux servers with 16
> GB RAM and index a total of 2.2 million. Yet to decide how many shards and
> replicas to have (Any hint on this is welcome too, basically 'only'
> performance testing, so suggest the number of shards and replicas if you
> can). Ultimately, I'm trying to find the QPS that this SOLR cloud set up
> can handle.
> >>
> >> To summarize,
> >>
> >> 1.   Find the QPS that my solr cloud set up can support
> >>
> >> 2.   Using 5.3.1 version with external zookeeper
> >>
> >> 3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million
> documents
> >>
> >> 4.   Yet to decide number of shards and replicas
> >>
> >> 5.   Not using any custom search application (performance testing for
> SOLR and not for Search portal)
> >>
> >> Thank you
> >>
>


Re: Performance testing on SOLR cloud

2015-11-17 Thread Erick Erickson
I wouldn't bother to shard either. YMMV of course, but 2.2M documents
is actually a pretty small number unless the docs themselves are huge.
Sharding introduces inevitable overhead, so it's usually the last
thing you resort to.

As far as the number of replicas is concerned, that's strictly a
function of what QPS you need. Let's say you do not shard and have a
query rate of 20 queries-per-second. If you need to support 100 QPS,
just add 4 more replicas, this can be done any time.

Best,
Erick

On Tue, Nov 17, 2015 at 3:38 PM, Markus Jelsma
 wrote:
> Hi - we use the Siege load testing program. It can take a seed list of URL's, 
> taken from actual user input, and can put load in parallel. It won't reuse 
> common queries unless you prepare your seed list appropriately. If your setup 
> achieves the goal your client anticipates, then you are fine. Siege is not a 
> good tool to test extreme QPS due to obvious single machine and network 
> limitations.
>
> Assuming your JVM heap settings and Solr cache settings are optimal, and your 
> only question is how many shards, then increase the number of shards. 
> Oversharding can be beneficial because more threads process less data. Every 
> single core search is single threaded, so oversharding on the same hardware 
> makes sense, and it seems to pay off.
>
> Make sure you run multiple long stress tests and restart JVM's in between 
> because a) query times and load tend to regress to the mean and b) because 
> HotSpot needs to 'warm up' so short tests make less sense.
>
> M.
>
>
>
> -Original message-
>> From:Aswath Srinivasan (TMS) 
>> Sent: Tuesday 17th November 2015 23:46
>> To: solr-user@lucene.apache.org
>> Subject: Performance testing on SOLR cloud
>>
>> Hi fellow developers,
>>
>> Please share your experience, on how you did performance testing on SOLR? 
>> What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM 
>> and index a total of 2.2 million. Yet to decide how many shards and replicas 
>> to have (Any hint on this is welcome too, basically 'only' performance 
>> testing, so suggest the number of shards and replicas if you can). 
>> Ultimately, I'm trying to find the QPS that this SOLR cloud set up can 
>> handle.
>>
>> To summarize,
>>
>> 1.   Find the QPS that my solr cloud set up can support
>>
>> 2.   Using 5.3.1 version with external zookeeper
>>
>> 3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million 
>> documents
>>
>> 4.   Yet to decide number of shards and replicas
>>
>> 5.   Not using any custom search application (performance testing for SOLR 
>> and not for Search portal)
>>
>> Thank you
>>


RE: Performance testing on SOLR cloud

2015-11-17 Thread Markus Jelsma
Hi - we use the Siege load testing program. It can take a seed list of URL's, 
taken from actual user input, and can put load in parallel. It won't reuse 
common queries unless you prepare your seed list appropriately. If your setup 
achieves the goal your client anticipates, then you are fine. Siege is not a 
good tool to test extreme QPS due to obvious single machine and network 
limitations.

Assuming your JVM heap settings and Solr cache settings are optimal, and your 
only question is how many shards, then increase the number of shards. 
Oversharding can be beneficial because more threads process less data. Every 
single core search is single threaded, so oversharding on the same hardware 
makes sense, and it seems to pay off.

Make sure you run multiple long stress tests and restart JVM's in between 
because a) query times and load tend to regress to the mean and b) because 
HotSpot needs to 'warm up' so short tests make less sense.

M.

 
 
-Original message-
> From:Aswath Srinivasan (TMS) 
> Sent: Tuesday 17th November 2015 23:46
> To: solr-user@lucene.apache.org
> Subject: Performance testing on SOLR cloud
> 
> Hi fellow developers,
> 
> Please share your experience, on how you did performance testing on SOLR? 
> What I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM 
> and index a total of 2.2 million. Yet to decide how many shards and replicas 
> to have (Any hint on this is welcome too, basically 'only' performance 
> testing, so suggest the number of shards and replicas if you can). 
> Ultimately, I'm trying to find the QPS that this SOLR cloud set up can handle.
> 
> To summarize,
> 
> 1.   Find the QPS that my solr cloud set up can support
> 
> 2.   Using 5.3.1 version with external zookeeper
> 
> 3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million documents
> 
> 4.   Yet to decide number of shards and replicas
> 
> 5.   Not using any custom search application (performance testing for SOLR 
> and not for Search portal)
> 
> Thank you
> 


Performance testing on SOLR cloud

2015-11-17 Thread Aswath Srinivasan (TMS)
Hi fellow developers,

Please share your experience, on how you did performance testing on SOLR? What 
I'm trying to do is have SOLR cloud on 3 Linux servers with 16 GB RAM and index 
a total of 2.2 million. Yet to decide how many shards and replicas to have (Any 
hint on this is welcome too, basically 'only' performance testing, so suggest 
the number of shards and replicas if you can). Ultimately, I'm trying to find 
the QPS that this SOLR cloud set up can handle.

To summarize,

1.   Find the QPS that my solr cloud set up can support

2.   Using 5.3.1 version with external zookeeper

3.   3 Linux servers with 16 GB RAM and index a total of 2.2 million documents

4.   Yet to decide number of shards and replicas

5.   Not using any custom search application (performance testing for SOLR and 
not for Search portal)

Thank you