Why limit shards to 5gb?  Have you capacity tested your hardware + data and 
determined the max size is around 5gb?  

If the answer is no, I would encourage you to perform some capacity 
planning first.  It may be that your system can handle 15 or 20gb per 
shard, or put a different way, 50m documents per shard.  It is impossible 
to know without testing however.

So first order of business is capacity planning.  Create a single index, 
with a single shard, and start indexing data and running searches.  When 
the response time/indexing throughput/<whatever requirement> becomes 
unsatisfying to your SLA...you've hit your limit.  Write down the number of 
docs in the shard, and the physical size.  *Now* you can do some planning 
and extrapolate from that value.  

I would be wary of a cluster with so many shards, no matter how they are 
partitioned.  14400 shards is a lot of wasted memory in terms of just raw 
overhead.  I think you may be underestimating the capacity of a single 
shard, but it's hard to say without some benchmarks.

I would not necessarily be opposed to multiple clusters.  Often they can 
make logistics simpler, and prevent issues that crop up in extreme 
multi-tenant environments like enormous cluster states that stall the 
master.

-Zach



On Wednesday, March 19, 2014 5:16:01 PM UTC-5, Brad Jordan wrote:
>
> Hey all, 
>
> I have 30 machines to build my ES cluster. I am going to need to build 
> 2000 indexes with 14400 shards total. Should I build two clusters or one?
>
> I have a multi customer data set. 65% of the customers have very little 
> data, "small customers". The other 35% have 100x more data, 
> "big customers". My thought was to build one cluster where all the 
> "small customers" share a set of 60 "monthly" indexes each with 100 shards. 
> I would build a second cluster where each customer gets their own set of 60 
> "monthly" indexes each with 6 shards. This configuration is so that I can 
> keep shard sizes below 5gb across both clusters. The small cluster is 
> routed by customerId. The big cluster uses the default routing strategy.
>
> questions:
> 1) Should I make two clusters or just one?
> 2) Do I need to keep shard sizes below 5gb?
> 3) Is management of one cluster with 2000 indexes and 14400 shards more 
> difficult than 2 clusters where "small" cluster has 60 indexes and 600 
> shards and "big" cluster has 1900 indexes and 13000 shards?
>
> Thanks,
> Brad
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/b5d0896f-93a0-4b6b-a626-45629ca2f6d0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to