Ahh ok, knowing this extra info is good as it helps us help you :)

Logstash doesn't define how many shards to use, at least not that I can see
here -
https://github.com/elasticsearch/logstash/blob/master/lib/logstash/outputs/elasticsearch/elasticsearch-template.json
-
or through some quick tests. This means that any values it takes for shard
count will come from the ES config, which as was mentioned earlier, has a
default of 5 shards per index (plus one replica).

Keep in mind that with only one shard your search throughput is limited to
a single thread, thus if you have 80 million records with parent+child
relationships chances are it will take a fair while to get a response to
any query.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 27 June 2014 11:49, Drew Kutcharian <d...@venarc.com> wrote:

> Hi Mark,
>
> The problem that we have is that each “customer" could generate 60-80
> million docs/month on average. In addition, when a customer leaves, we
> would need to delete all their data. So hence it makes sense to have an
> index per customer (or even multiple indexes per customer). Another issue
> is that we are going to be needing to do a lot of “has_child” type of
> queries. And ES as it currently stands, loads up all the IDs of all the
> parent docs in index before running the query. So if we keep each customer
> on their own index, those has_child queries would only need to load up the
> ids for that specific client. In addition, one index with one shard per day
> is how Logstash works which is designed for ingesting a lot of data.
>
> - Drew
>
>
>
> On Jun 26, 2014, at 6:24 PM, Mark Walkom <ma...@campaignmonitor.com>
> wrote:
>
> Pretty sure he read it as I'd have offered the same advice :)
> You cannot change the sharding of an index after creation, you need to
> completely reindex the data to do so. This may not be a major issue for you
> but it's something to take into account when you have hundreds or thousands
> of customers, and hence indexes.
>
> You could also look at having a few indexes and use aliases and routing as
> this would be a much more efficient way of doing things.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com
> web: www.campaignmonitor.com
>
>
> On 27 June 2014 11:21, Drew Kutcharian <d...@venarc.com> wrote:
>
>> Hi Andrew,
>>
>> Not sure if you read my original question. The question is about having a
>> separate index per customer since we are going to have < 1000 customers but
>> each would have a lot of data. Each shard comes with it's own overhead
>> since it's an instance of Lucene. I was going with the 1 shard with 1
>> replica route because initially we can put a 100 of these customers on the
>> same machine and as they grow larger we can allocate more machines and move
>> the indexes around. With this approach, our capacity for a single customer
>> would be the max a single machine can handle which I think should be enough
>> given our requirements. If a customer is really pushing a single machine to
>> it's max, then we can move them to their own Elasticsearch cluster.
>>
>> - Drew
>>
>>
>> On Jun 26, 2014, at 1:57 PM, Andrew Selden <
>> andrew.sel...@elasticsearch.com> wrote:
>>
>> > Drew,
>> >
>> > The Elasticsearch default is to create 5 shards for each index. I would
>> start with this. Typically it is best to actually over-shard, which is to
>> say have more than 1 shard per node per index. There is not really any
>> measurable cost to this and it gives you flexibility in your design as you
>> scale out.
>> >
>> > For example, if you start with 5 shards on a single server and then
>> later decide you want to add another machine, Elasticsearch will
>> automatically transfer some of those shards over to the new server, giving
>> you better scalability. If you start with only 1 shard you will not get
>> this benefit.
>> >
>> > Andrew
>> >
>> > On Jun 26, 2014, at 8:29 PM, Drew Kutcharian <d...@venarc.com> wrote:
>> >
>> >> Hey Guys,
>> >>
>> >> I'm working on an analytics dashboard project where we collect events
>> into Elasticsearch for clients. Each client could have millions of events
>> per month. We are thinking of using one index with one shard and one
>> replica per client. Looking at Logstash, it seems like Logstash creates 1
>> index, with 1 shard and 0 replicas per day, so that's where we got the
>> inspiration. We don't anticipate having more than 1000 "clients". Are there
>> any issues with this design pattern?
>> >>
>> >> Thanks,
>> >>
>> >> Drew
>> >>
>> >> --
>> >> You received this message because you are subscribed to the Google
>> Groups "elasticsearch" group.
>> >> To unsubscribe from this group and stop receiving emails from it, send
>> an email to elasticsearch+unsubscr...@googlegroups.com.
>> >> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/9DC88022-E37D-4C55-81E6-71A52EC5B466%40venarc.com
>> .
>> >> For more options, visit https://groups.google.com/d/optout.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> Groups "elasticsearch" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> an email to elasticsearch+unsubscr...@googlegroups.com.
>> > To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/9915D1E3-BF3B-44DF-A060-45FA9FF05C46%40elasticsearch.com
>> .
>> > For more options, visit https://groups.google.com/d/optout.
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CA1CDC1E-3919-4D81-B4D3-9B4972FF5C87%40venarc.com
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEM624YOmvzABOgY_0bKyPYJRmF-UXKDUfK-CgTep6fLhhM65Q%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAEM624YOmvzABOgY_0bKyPYJRmF-UXKDUfK-CgTep6fLhhM65Q%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/EDA7CD58-7216-40D0-921C-AAE45ED0858B%40venarc.com
> <https://groups.google.com/d/msgid/elasticsearch/EDA7CD58-7216-40D0-921C-AAE45ED0858B%40venarc.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624bCysLLjgwJY822YU65rDj26BSkrfTjXhU68ZxM6zhLaA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to