Re: How to size a SOLR Cloud

2013-01-07 Thread Otis Gospodnetic
Hello FF,

Something like SPM for Solr will help you understand what's making Solr
slow - CPU maxed? Disk IO? Swapping? Caches too small? ...

There are no general rules/recipes, but once you see what is going on we
can provide guidance.

Yes, you can have 1 or more replicas of a shard.

Otis
--
Solr  ElasticSearch Support
http://sematext.com/





On Mon, Jan 7, 2013 at 12:14 PM, f.fourna...@gibmedia.fr 
f.fourna...@gibmedia.fr wrote:

 Hello,
 I'm new in SOLR and I've a collection with 25 millions of records.
 I want to run this collection on SOLR Cloud (sorl 4.0) under Amazon EC2
 instances.
 Currently I've configured 2 shards and 2 replica per shard with Medium
 instances (4Go, 1 CPU core) and response times are very long.
 How to size the cloud (sharding, replica, memory, CPU,...) to have
 acceptable response times in my situation? more memory ? more cpu ? more
 shards ? Does rules to size a solr cloud exists ?
 Is it possible to have more than 2 replicas on one shard ? is it relevant ?
 Best regards
 FF



Re: How to size a SOLR Cloud

2013-01-07 Thread Per Steffensen

Hi

I have some experience with practical limits. We have several setup we 
have tried to run with high load for long time:

1)
* 20 shards in one collection spread over 5 nodes (4 shards for the 
collection per node), no redunancdy (only one replica per shard)

* Indexing 35-50 mio documents per day and searching a little along the way
* We do not have detailed measurements on searching, but my impression 
is that search response times are fairly ok (below 5 secs for 
non-complicated searches) - at least the first 15 days, up to about 500 
mio documents
* We have very detailed measurements on indexing times though. They are 
good the first 15-17 days, up to 500-600 mio documents. Then we see a 
temporary slow-down in indexing times. This is because major merges 
happen at the same time across all shards. The indexing times speed up 
when this is over, though. After about 20 days everything stops running 
- things just get too slow and eventually nothing happens.

2)
* Same as 1), except 40 shards in one collection spread over 10 nodes, 
no redundancy
* Slowdown points seems to change linearly - slow-down around 1 billion 
docs and complete stop 1.3-1.4 billion docs


Therefore it seems a little strange to me that you have problems with 25 
mio docs in two shards.
One major difference is the redundancy, though. We are having only one 
replica per shard. We started our trying to run with redundancy (2 
replica per shard) but that involved a lot of problems. Things never 
successfully recover when recover situations occur, and we see like 
4-times indexing times compared to non-redundancy (even though a max of 
2-times should be expected).


Regards, Per Steffensen


On 1/7/13 6:14 PM, f.fourna...@gibmedia.fr wrote:

Hello,
I'm new in SOLR and I've a collection with 25 millions of records.
I want to run this collection on SOLR Cloud (sorl 4.0) under Amazon EC2
instances.
Currently I've configured 2 shards and 2 replica per shard with Medium
instances (4Go, 1 CPU core) and response times are very long.
How to size the cloud (sharding, replica, memory, CPU,...) to have
acceptable response times in my situation? more memory ? more cpu ? more
shards ? Does rules to size a solr cloud exists ?
Is it possible to have more than 2 replicas on one shard ? is it relevant ?
Best regards
FF