That starts to argue for lots of smaller servers maybe even with smaller
SSD's. Say, a low power i3 with 16 or 23gb ram, and a 128gb SSD. Is that
right?

On Wed, Apr 22, 2015 at 2:56 PM, Mark Walkom <markwal...@gmail.com> wrote:

> If you are using time series data then you should be using time series
> indices. As Fred pointed out, routing an entire month's worth of data to a
> single shard is not going to scale.
>
> Also, we recommend that you keep shard size below 50GB, this helps with
> recovery and distribution. There is also a hard 2 billion doc per shard
> limit in the underlying lucene engine, if you hit this then you may lose
> data.
>
> On 23 April 2015 at 03:12, Kimbro Staken <ksta...@kstaken.com> wrote:
>
>> Hello Fred,
>>
>> I have clusters as large as 200billion documents/130TB. Sharing
>> experiences on that would require a book, but a couple quick things that
>> jumped out at me.
>>
>> 1. do not go the huge server route. Elasticasearch works best when you
>> scale it horizontally. The 64GB route is a much better option.
>>
>> 2. If I understand correctly you're routing an entire months data to a
>> single shard? By doing that you're directing all activity on that shard to
>> a single machine, or small set of machines if you have replicas. That has
>> to be much slower than if you were to do something like use a monthly index
>> with a reasonable number of shards to spread that load across the cluster.
>> That is also creating shard sizes that are fairly large and if you have
>> month to month variation in data rates you'll end up with "lumpy" shard
>> sizes which will definitely cause issues if you ever run your cluster low
>> on disk space.
>>
>> 3. Get off of ES 1.3 as fast as you can. 8TB spread across 37 machines is
>> very low density, as you push more data in you don't want to be on ES 1.3.
>>
>> 4. If you're not already using doc_values start looking into it now.
>> Managing heap memory is let's be nice and call it "a challenge" and
>> fielddata can eat heap in ways that will make your head spin.
>>
>>
>>
>> Kimbro Staken
>>
>>
>> On Wed, Apr 22, 2015 at 1:14 AM, <fdevilla...@synthesio.com> wrote:
>>
>>> Hi list,
>>>
>>> I've been using ES in production since 0.17.6 with clusters up to 64
>>> virtual machines and 20T data (including 3 replica). We're now thinking
>>> about pushing things a bit further and I wondered if people here had
>>> similar experience / needs as we do.
>>>
>>> Our current index is 1.1 billion unique documents, 8Tb data (including 1
>>> replica) on 37 physical machines (32 data nodes, 3 master nodes and 2 nodes
>>> dedicated to http requests) with ES 1.3 (upgrade to 1.5 already planned).
>>> We're indexing about 2500 new documents / second and everything's fine so
>>> far.
>>>
>>> Our goal is to index (and search) about 30 billion more documents (the
>>> backdata) + about 200 million new documents each month.
>>>
>>> Our company is providing analytics dashboards to their clients, and they
>>> mostly browse their data on a monthly scale, so we're routing documents
>>> monthly. Each shard makes between 200 and 250G. The index is made of 128
>>> shards, which makes about 10 years of data with 1 month per shard.
>>> Considering what we already have, we should reach 240T of data (and
>>> counting) with a single replica after we index all our backdata.
>>>
>>> So, my questions here:
>>>
>>> - Has someone here the same use / amount of data as we do?
>>>
>>> - Is ES the right technology to do realtime, ligthspeed queries
>>> (filtered queries and high cardinality agregations) on such an amount of
>>> data?
>>>
>>> - What were the traps to avoid? Is it better to add lots of medium
>>> machines (12 core  Xeon E5-1650 v2, 64G RAM, 1.8T SAS 15k hard drives) or a
>>> few huge machines with petabytes of RAM, terabytes of SSD and multiple ES
>>> processes?
>>>
>>> Any feedback on similar situation is indeed appreciated.
>>>
>>> Have a nice day,
>>> Fred
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to elasticsearch+unsubscr...@googlegroups.com.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/elasticsearch/6865703f-2302-4fe0-b929-eb9fbe55a84a%40googlegroups.com
>>> <https://groups.google.com/d/msgid/elasticsearch/6865703f-2302-4fe0-b929-eb9fbe55a84a%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to elasticsearch+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/elasticsearch/CAA0DmXZTqYgoKAKxLKGUeSXv_Mjjrer1dogaYARf1Ny7kio_3A%40mail.gmail.com
>> <https://groups.google.com/d/msgid/elasticsearch/CAA0DmXZTqYgoKAKxLKGUeSXv_Mjjrer1dogaYARf1Ny7kio_3A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-PC7L%2Be823M-6wR6ReRdV6zgt56WW0z0Uf_Vy62iNwrQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-PC7L%2Be823M-6wR6ReRdV6zgt56WW0z0Uf_Vy62iNwrQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAH6s0fwv98bByXNbTaGXhfQnuAU%3DKfeR-ATEN0XWZb6zbGqGew%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to