Re: Snapshot Scaling Problems

Andy Nemzek Mon, 16 Mar 2015 07:59:05 -0700

Good to know, thanks for the tip!  

PS: per the prior conversations, I tried optimizing all the indexes so that 
they only contain 1 segment per shard.  The operation actually did not take 
long, but it appears to have had only a marginal effect on the performance 
of snapshots.  I'm not sure if it matters that the indexes are optimized, 
but the indexes in the old snapshots aren't...that is, will snapshotting 
get faster after those old snapshots have all been deleted and replaced 
with snapshots of the new optimized indexes?


I guess the other two levers to try cranking are reducing shards and 
reducing indexes.  If I understand correctly, there's no way to do this 
without writing up some script?


On Friday, March 13, 2015 at 11:52:59 AM UTC-5, Aaron Mefford wrote:
>
> Yes it was m1.smalls that I first noticed the EBS throttling on.  Things 
> work well in bursts, but sustained EBS does not work well.  It will work 
> substantially better in an m3.medium and if you are using the new EBS SSD 
> volumes.
>
> On Thu, Mar 12, 2015 at 10:30 PM, Andy Nemzek <bitk...@gmail.com 
> <javascript:>> wrote:
>
>> Thank you guys for your thoughts here.  This is really useful 
>> information.  Again, we're creating daily indexes because that's what 
>> logstash does out of the box with the elasticsearch plugin, and this kind 
>> of tuning info isn't included with that plugin.
>>
>> Minimizing both the number of indexes and shards now sounds like a great 
>> idea.
>>
>> We are indeed using EC2.  We're just using an m1.small that's EBS backed 
>> (non-SSD).  So, yes, it's not a very powerful machine, but again, we're not 
>> throwing a lot of data at it either.
>>
>>
>> On Thursday, March 12, 2015 at 12:50:22 PM UTC-5, aa...@definemg.com 
>> wrote:
>>>
>>> With the low volume of ingest, and the long duration of history, Id 
>>> suggest you may want to trim back the number of shards per index from the 
>>> default 5.  Based on your 100 docs per day Id say 1 shard per day.  If you 
>>> combined this with the other suggestion to increase the duration of an 
>>> index, then you might increase the number of shards, but maybe still not.  
>>> Running an optimize once you have completed a time period is great advice 
>>> if you can afford the overhead, sounds like one day at a time you should be 
>>> able to, and that the overhead of not optimizing is costing you more when 
>>> you snapshot.
>>>
>>> And index is made of shards, a shard is made of lucene segments.  Lucene 
>>> segments are the actual files that you copy when you snapshot.  As such the 
>>> number of segments is multiplied by the number of shards per index and the 
>>> number of indexes.  Reducing the number of indexes by creating larger time 
>>> periods will significantly reduce the number of segments.  Reducing the 
>>> number of shards per index will significantly reduce the number of 
>>> segments.  Optimizing the index will also consolidate many segments into a 
>>> single segment.
>>>
>>> Based on the use of S3 should we assume you are using AWS EC2?  What 
>>> instance size?  Your data volume seems very low so it seems concerning that 
>>> you have such a large time period to snapshot, and points to a slow file 
>>> system, or a significant number of segments (100 indexes, 5 shards per 
>>> index, xx segments per shard, == many thousands of segments).  What does 
>>> your storage system look like?  If you are using EC2 are you using the 
>>> newer EBS volumes (SSD backed)? Some of the smaller instance size 
>>> significantly limit prolonged EBS throughput, in my experience. 
>>>
>>> On Wednesday, March 11, 2015 at 1:12:01 AM UTC-6, Magnus Bäck wrote:
>>>>
>>>> On Monday, March 09, 2015 at 20:29 CET, 
>>>>      Andy Nemzek <bitk...@gmail.com> wrote: 
>>>>
>>>> > We've been using logstash for several months now and it creates a new 
>>>> > index each day, so I imagine there are over 100 indexes at this 
>>>> point. 
>>>>
>>>> Why create daily indexes if you only have a few hundred entries in 
>>>> each? 
>>>> There's a constant overhead for each shard so you don't want more 
>>>> indexes than you need. Seems like you'd be fine with montly indexes, 
>>>> and then your snapshot problems would disappear too. 
>>>>
>>>> > Elasticsearch is running on a single machine...I haven't done 
>>>> anything 
>>>> > with shards, so the defaults must be in use.  Haven't optimized old 
>>>> > indexes.  We're pretty much just running ELK out of the box.  When 
>>>> you 
>>>> > mention 'optimizing indexes', does this process combine indexes? 
>>>>
>>>> No, but it can combine segments in a Lucene index (that make up 
>>>> Elasticsearch indexes), and segments are what's being backed up. 
>>>> So the more segments you have the the longer time snapshots are 
>>>> going to take. 
>>>>
>>>> > Do you know if these performance problems are typical when 
>>>> > using ELK out of the box? 
>>>>
>>>> 100 indexes on a single box should be okay but it depends on 
>>>> the size of the JVM heap. 
>>>>
>>>> -- 
>>>> Magnus Bäck                | Software Engineer, Development Tools 
>>>> magnu...@sonymobile.com | Sony Mobile Communications 
>>>>
>>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "elasticsearch" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/elasticsearch/VEdqtEpc3ac/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/58a94899-185c-4c4d-ad5f-ac2e0a5eed2d%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/58a94899-185c-4c4d-ad5f-ac2e0a5eed2d%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/13ae8aeb-6e07-4a00-bc2a-f8792adc4d9d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Snapshot Scaling Problems

Reply via email to