Ahh ok, knowing this extra info is good as it helps us help you :) Logstash doesn't define how many shards to use, at least not that I can see here - https://github.com/elasticsearch/logstash/blob/master/lib/logstash/outputs/elasticsearch/elasticsearch-template.json - or through some quick tests. This means that any values it takes for shard count will come from the ES config, which as was mentioned earlier, has a default of 5 shards per index (plus one replica).
Keep in mind that with only one shard your search throughput is limited to a single thread, thus if you have 80 million records with parent+child relationships chances are it will take a fair while to get a response to any query. Regards, Mark Walkom Infrastructure Engineer Campaign Monitor email: ma...@campaignmonitor.com web: www.campaignmonitor.com On 27 June 2014 11:49, Drew Kutcharian <d...@venarc.com> wrote: > Hi Mark, > > The problem that we have is that each “customer" could generate 60-80 > million docs/month on average. In addition, when a customer leaves, we > would need to delete all their data. So hence it makes sense to have an > index per customer (or even multiple indexes per customer). Another issue > is that we are going to be needing to do a lot of “has_child” type of > queries. And ES as it currently stands, loads up all the IDs of all the > parent docs in index before running the query. So if we keep each customer > on their own index, those has_child queries would only need to load up the > ids for that specific client. In addition, one index with one shard per day > is how Logstash works which is designed for ingesting a lot of data. > > - Drew > > > > On Jun 26, 2014, at 6:24 PM, Mark Walkom <ma...@campaignmonitor.com> > wrote: > > Pretty sure he read it as I'd have offered the same advice :) > You cannot change the sharding of an index after creation, you need to > completely reindex the data to do so. This may not be a major issue for you > but it's something to take into account when you have hundreds or thousands > of customers, and hence indexes. > > You could also look at having a few indexes and use aliases and routing as > this would be a much more efficient way of doing things. > > Regards, > Mark Walkom > > Infrastructure Engineer > Campaign Monitor > email: ma...@campaignmonitor.com > web: www.campaignmonitor.com > > > On 27 June 2014 11:21, Drew Kutcharian <d...@venarc.com> wrote: > >> Hi Andrew, >> >> Not sure if you read my original question. The question is about having a >> separate index per customer since we are going to have < 1000 customers but >> each would have a lot of data. Each shard comes with it's own overhead >> since it's an instance of Lucene. I was going with the 1 shard with 1 >> replica route because initially we can put a 100 of these customers on the >> same machine and as they grow larger we can allocate more machines and move >> the indexes around. With this approach, our capacity for a single customer >> would be the max a single machine can handle which I think should be enough >> given our requirements. If a customer is really pushing a single machine to >> it's max, then we can move them to their own Elasticsearch cluster. >> >> - Drew >> >> >> On Jun 26, 2014, at 1:57 PM, Andrew Selden < >> andrew.sel...@elasticsearch.com> wrote: >> >> > Drew, >> > >> > The Elasticsearch default is to create 5 shards for each index. I would >> start with this. Typically it is best to actually over-shard, which is to >> say have more than 1 shard per node per index. There is not really any >> measurable cost to this and it gives you flexibility in your design as you >> scale out. >> > >> > For example, if you start with 5 shards on a single server and then >> later decide you want to add another machine, Elasticsearch will >> automatically transfer some of those shards over to the new server, giving >> you better scalability. If you start with only 1 shard you will not get >> this benefit. >> > >> > Andrew >> > >> > On Jun 26, 2014, at 8:29 PM, Drew Kutcharian <d...@venarc.com> wrote: >> > >> >> Hey Guys, >> >> >> >> I'm working on an analytics dashboard project where we collect events >> into Elasticsearch for clients. Each client could have millions of events >> per month. We are thinking of using one index with one shard and one >> replica per client. Looking at Logstash, it seems like Logstash creates 1 >> index, with 1 shard and 0 replicas per day, so that's where we got the >> inspiration. We don't anticipate having more than 1000 "clients". Are there >> any issues with this design pattern? >> >> >> >> Thanks, >> >> >> >> Drew >> >> >> >> -- >> >> You received this message because you are subscribed to the Google >> Groups "elasticsearch" group. >> >> To unsubscribe from this group and stop receiving emails from it, send >> an email to elasticsearch+unsubscr...@googlegroups.com. >> >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/9DC88022-E37D-4C55-81E6-71A52EC5B466%40venarc.com >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > >> > -- >> > You received this message because you are subscribed to the Google >> Groups "elasticsearch" group. >> > To unsubscribe from this group and stop receiving emails from it, send >> an email to elasticsearch+unsubscr...@googlegroups.com. >> > To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/9915D1E3-BF3B-44DF-A060-45FA9FF05C46%40elasticsearch.com >> . >> > For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CA1CDC1E-3919-4D81-B4D3-9B4972FF5C87%40venarc.com >> . >> For more options, visit https://groups.google.com/d/optout. >> > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAEM624YOmvzABOgY_0bKyPYJRmF-UXKDUfK-CgTep6fLhhM65Q%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAEM624YOmvzABOgY_0bKyPYJRmF-UXKDUfK-CgTep6fLhhM65Q%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/EDA7CD58-7216-40D0-921C-AAE45ED0858B%40venarc.com > <https://groups.google.com/d/msgid/elasticsearch/EDA7CD58-7216-40D0-921C-AAE45ED0858B%40venarc.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624bCysLLjgwJY822YU65rDj26BSkrfTjXhU68ZxM6zhLaA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.