Why limit shards to 5gb? Have you capacity tested your hardware + data and determined the max size is around 5gb?
If the answer is no, I would encourage you to perform some capacity planning first. It may be that your system can handle 15 or 20gb per shard, or put a different way, 50m documents per shard. It is impossible to know without testing however. So first order of business is capacity planning. Create a single index, with a single shard, and start indexing data and running searches. When the response time/indexing throughput/<whatever requirement> becomes unsatisfying to your SLA...you've hit your limit. Write down the number of docs in the shard, and the physical size. *Now* you can do some planning and extrapolate from that value. I would be wary of a cluster with so many shards, no matter how they are partitioned. 14400 shards is a lot of wasted memory in terms of just raw overhead. I think you may be underestimating the capacity of a single shard, but it's hard to say without some benchmarks. I would not necessarily be opposed to multiple clusters. Often they can make logistics simpler, and prevent issues that crop up in extreme multi-tenant environments like enormous cluster states that stall the master. -Zach On Wednesday, March 19, 2014 5:16:01 PM UTC-5, Brad Jordan wrote: > > Hey all, > > I have 30 machines to build my ES cluster. I am going to need to build > 2000 indexes with 14400 shards total. Should I build two clusters or one? > > I have a multi customer data set. 65% of the customers have very little > data, "small customers". The other 35% have 100x more data, > "big customers". My thought was to build one cluster where all the > "small customers" share a set of 60 "monthly" indexes each with 100 shards. > I would build a second cluster where each customer gets their own set of 60 > "monthly" indexes each with 6 shards. This configuration is so that I can > keep shard sizes below 5gb across both clusters. The small cluster is > routed by customerId. The big cluster uses the default routing strategy. > > questions: > 1) Should I make two clusters or just one? > 2) Do I need to keep shard sizes below 5gb? > 3) Is management of one cluster with 2000 indexes and 14400 shards more > difficult than 2 clusters where "small" cluster has 60 indexes and 600 > shards and "big" cluster has 1900 indexes and 13000 shards? > > Thanks, > Brad > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b5d0896f-93a0-4b6b-a626-45629ca2f6d0%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
