Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Sally Ahn Wed, 12 Nov 2014 23:02:48 -0800

I am also interested in this topic.
We were snapshotting our cluster of two nodes every 2 hours (invoked via a 
cron job) to an S3 repository (we were running ES 1.2.2 with 
cloud-aws-plugin version 2.2.0, then we upgraded to ES 1.4.0 with 
cloud-aws-plugin 2.4.0 but are still seeing issues described below).
I've been seeing an increase in the time it takes to complete a snapshot 
with each subsequent snapshot. 
I see a thread 
<https://groups.google.com/forum/?fromgroups#!searchin/elasticsearch/snapshot/elasticsearch/bCKenCVFf2o/TFK-Es0wxSwJ>
 where 
someone else was seeing the same thing, but that thread seems to have died.
In my case, snapshots have gone from taking ~5 minutes to taking about an 
hour, even between snapshots where data does not seem to have changed.

For example, you can see below a list of the snapshots stored in my S3 
repo. Each snapshot is named with a timestamp of when my cron job invoked 
the snapshot process. The S3 timestamp on the left shows the completion 
time of that snapshot, and it's clear that it's steadily increasing:

2014-09-30 10:05       686   s3://<bucketname>/snapshot-2014.09.30-10:00:01
2014-09-30 12:05       686   s3://<bucketname>/snapshot-2014.09.30-12:00:01
2014-09-30 14:05       736   s3://<bucketname>/snapshot-2014.09.30-14:00:01
2014-09-30 16:05       736   s3://<bucketname>/snapshot-2014.09.30-16:00:01
...
2014-11-08 00:52      1488   s3://<bucketname>/snapshot-2014.11.08-00:00:01
2014-11-08 02:54      1488   s3://<bucketname>/snapshot-2014.11.08-02:00:01
...
2014-11-08 14:54      1488   s3://<bucketname>/snapshot-2014.11.08-14:00:01
2014-11-08 16:53      1488   s3://<bucketname>/snapshot-2014.11.08-16:00:01
...
2014-11-11 07:00      1638   s3://<bucketname>/snapshot-2014.11.11-06:00:01
2014-11-11 08:58      1638   s3://<bucketname>/snapshot-2014.11.11-08:00:01
2014-11-11 10:58      1638   s3://<bucketname>/snapshot-2014.11.11-10:00:01
2014-11-11 12:59      1638   s3://<bucketname>/snapshot-2014.11.11-12:00:01
2014-11-11 15:00      1638   s3://<bucketname>/snapshot-2014.11.11-14:00:01
2014-11-11 17:00      1638   s3://<bucketname>/snapshot-2014.11.11-16:00:01

I suspected that this gradual increase was related to the accumulation of 
old snapshots after I tested the following:
1. I created a brand new cluster with the same hardware specs in the same 
datacenter and restored a snapshot of the problematic cluster taken few 
days back (i.e. not the latest snapshot). 
2. I then backed up that restored data to a new empty bucket in the same S3 
region, and that was very fast...a minute or less. 
3. I then restored a later snapshot of the problematic cluster to the test 
cluster and tried backing it up again to the new bucket, and that also took 
about a minute or less.

However, when I tried deleting the repository full of old snapshots from 
the problematic cluster and registering a brand new empty bucket, I found 
that my first snapshot to the new repository was also hanging indefinitely. 
I finally had to kill my snapshot curl command. There were no errors in the 
logs (the snapshot logger is very terse...wondering if anyone knows how to 
increase the verbosity for it).

So my theory seems to have been debunked, and I am again at a loss. I am 
wondering whether the hanging snapshot is related to the slow snapshots I 
was seeing before I deleted that old repository. I have seen several issues 
in GitHub regarding hanging snapshots (#5958 
<https://github.com/elasticsearch/elasticsearch/issues/5958>, #7980 
<https://github.com/elasticsearch/elasticsearch/issues/7980>) and have 
tried using the elasticsearch-snapshot-cleanup 
<https://github.com/imotov/elasticsearch-snapshot-cleanup> utility on my 
cluster both before and after I upgraded from version 1.2.2 to 1.4.0 (I 
thought upgrading to 1.4.0 which included snapshot improvements may fix my 
issues, but it did not), and the script is not finding any running 
snapshots:

[2014-11-13 05:37:45,451][INFO ][org.elasticsearch.node   ] [Golden Archer] 
started
[2014-11-13 05:37:45,451][INFO 
][org.elasticsearch.org.motovs.elasticsearch.snapshots.AbortedSnapshotCleaner] 
No snapshots found
[2014-11-13 05:37:45,452][INFO ][org.elasticsearch.node   ] [Golden Archer] 
stopping ...

Curling to _snapshot/REPO/_status also returns no ongoing snapshots:

curl -XGET 
'http://<hostname>:9200/_snapshot/s3_backup_repo/_status?pretty=true'
{
  "snapshots" : [ ]
}

I may try bouncing ES on each node to see if that kills whatever process is 
causing my requests to the snapshot module to hang (requests to other 
modules like _cluster/health returns fine; cluster health is green, and 
load is low for both nodes - 0.00, 0.06).

I would really appreciate some help/guidance on how to debug/fix this issue 
and general recommendations on how to best achieve periodic snapshots. For 
example, cleaning up old snapshots seems rather difficult since we have to 
specify the snapshot name, which we would obtain by making a request to the 
snapshot module, which seems to hang often.

Thanks,
Sally

On Monday, November 10, 2014 12:27:10 AM UTC-8, Pradeep Reddy wrote:
>
> Hi Vineeth,
>
> Thanks for the reply.
> I am aware of how to create and delete snapshots using cloud-aws.
>
> What I wanted to know was how should the work flow of periodic snapshot 
> be?especially how to deal with old snapshots? having too many old 
> snapshots- will this impact something?
>
> On Friday, November 7, 2014 8:16:05 PM UTC+5:30, vineeth mohan wrote:
>>
>> Hi , 
>>
>> There is a s3 repository plugin - 
>> https://github.com/elasticsearch/elasticsearch-cloud-aws#s3-repository
>> Use this.
>> The snapshots are incremental , so it should fit your purpose perfectly.
>>
>> Thanks
>>              Vineeth
>>
>> On Fri, Nov 7, 2014 at 3:22 PM, Pradeep Reddy <pradeepreddy...@gmail.com> 
>> wrote:
>>
>>> I want to backup the data every 15-30 min. I will be storing the 
>>> snapshots in S3. 
>>>
>>> DELETE old and then PUT new snapshot many not be the best practice as 
>>> you may end up with nothing if something goes wrong.
>>>
>>> Using timestamp for snapshot names may be one option, but how to delete 
>>> old snapshots then?
>>> Does S3 life management cycle help to delete old snapshots?
>>>
>>> Looking forward to get some opinions on this.
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "elasticsearch" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to elasticsearc...@googlegroups.com.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/elasticsearch/0dd81d83-5066-4652-9703-dfce63b46993%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/cd4a64b1-9276-44a6-8ff3-688759d2be57%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: What is the best practice for periodic snapshotting with awc-cloud+s3

Reply via email to