That starts to argue for lots of smaller servers maybe even with smaller SSD's. Say, a low power i3 with 16 or 23gb ram, and a 128gb SSD. Is that right?
On Wed, Apr 22, 2015 at 2:56 PM, Mark Walkom <markwal...@gmail.com> wrote: > If you are using time series data then you should be using time series > indices. As Fred pointed out, routing an entire month's worth of data to a > single shard is not going to scale. > > Also, we recommend that you keep shard size below 50GB, this helps with > recovery and distribution. There is also a hard 2 billion doc per shard > limit in the underlying lucene engine, if you hit this then you may lose > data. > > On 23 April 2015 at 03:12, Kimbro Staken <ksta...@kstaken.com> wrote: > >> Hello Fred, >> >> I have clusters as large as 200billion documents/130TB. Sharing >> experiences on that would require a book, but a couple quick things that >> jumped out at me. >> >> 1. do not go the huge server route. Elasticasearch works best when you >> scale it horizontally. The 64GB route is a much better option. >> >> 2. If I understand correctly you're routing an entire months data to a >> single shard? By doing that you're directing all activity on that shard to >> a single machine, or small set of machines if you have replicas. That has >> to be much slower than if you were to do something like use a monthly index >> with a reasonable number of shards to spread that load across the cluster. >> That is also creating shard sizes that are fairly large and if you have >> month to month variation in data rates you'll end up with "lumpy" shard >> sizes which will definitely cause issues if you ever run your cluster low >> on disk space. >> >> 3. Get off of ES 1.3 as fast as you can. 8TB spread across 37 machines is >> very low density, as you push more data in you don't want to be on ES 1.3. >> >> 4. If you're not already using doc_values start looking into it now. >> Managing heap memory is let's be nice and call it "a challenge" and >> fielddata can eat heap in ways that will make your head spin. >> >> >> >> Kimbro Staken >> >> >> On Wed, Apr 22, 2015 at 1:14 AM, <fdevilla...@synthesio.com> wrote: >> >>> Hi list, >>> >>> I've been using ES in production since 0.17.6 with clusters up to 64 >>> virtual machines and 20T data (including 3 replica). We're now thinking >>> about pushing things a bit further and I wondered if people here had >>> similar experience / needs as we do. >>> >>> Our current index is 1.1 billion unique documents, 8Tb data (including 1 >>> replica) on 37 physical machines (32 data nodes, 3 master nodes and 2 nodes >>> dedicated to http requests) with ES 1.3 (upgrade to 1.5 already planned). >>> We're indexing about 2500 new documents / second and everything's fine so >>> far. >>> >>> Our goal is to index (and search) about 30 billion more documents (the >>> backdata) + about 200 million new documents each month. >>> >>> Our company is providing analytics dashboards to their clients, and they >>> mostly browse their data on a monthly scale, so we're routing documents >>> monthly. Each shard makes between 200 and 250G. The index is made of 128 >>> shards, which makes about 10 years of data with 1 month per shard. >>> Considering what we already have, we should reach 240T of data (and >>> counting) with a single replica after we index all our backdata. >>> >>> So, my questions here: >>> >>> - Has someone here the same use / amount of data as we do? >>> >>> - Is ES the right technology to do realtime, ligthspeed queries >>> (filtered queries and high cardinality agregations) on such an amount of >>> data? >>> >>> - What were the traps to avoid? Is it better to add lots of medium >>> machines (12 core Xeon E5-1650 v2, 64G RAM, 1.8T SAS 15k hard drives) or a >>> few huge machines with petabytes of RAM, terabytes of SSD and multiple ES >>> processes? >>> >>> Any feedback on similar situation is indeed appreciated. >>> >>> Have a nice day, >>> Fred >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "elasticsearch" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to elasticsearch+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/elasticsearch/6865703f-2302-4fe0-b929-eb9fbe55a84a%40googlegroups.com >>> <https://groups.google.com/d/msgid/elasticsearch/6865703f-2302-4fe0-b929-eb9fbe55a84a%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "elasticsearch" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to elasticsearch+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/elasticsearch/CAA0DmXZTqYgoKAKxLKGUeSXv_Mjjrer1dogaYARf1Ny7kio_3A%40mail.gmail.com >> <https://groups.google.com/d/msgid/elasticsearch/CAA0DmXZTqYgoKAKxLKGUeSXv_Mjjrer1dogaYARf1Ny7kio_3A%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> >> For more options, visit https://groups.google.com/d/optout. >> > > -- > You received this message because you are subscribed to the Google Groups > "elasticsearch" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to elasticsearch+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-PC7L%2Be823M-6wR6ReRdV6zgt56WW0z0Uf_Vy62iNwrQ%40mail.gmail.com > <https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-PC7L%2Be823M-6wR6ReRdV6zgt56WW0z0Uf_Vy62iNwrQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAH6s0fwv98bByXNbTaGXhfQnuAU%3DKfeR-ATEN0XWZb6zbGqGew%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.