I've been implementing an ELK stack for the past year or so. I had thought 
that we would have plenty of space, but recently added a log source that 
increased the number of log entries a day by around 30x. That prompted me 
to start looking into ways of managing ES's data storage in order to keep 
from running out of space. Which led me to Curator and Snapshots.

If I am reading the documentation[1] for both systems correctly, I think I 
can do the following:

   - Create a repository for old data.
   - Use a cron job and Curator to automatically take snapshots of data 
   older than a certain time period (say, 6 months).
      - Then have Curator delete the data older than that time period.
      - The result would be that all data older than the time period would 
      be stored in the repository. The data would be compressed (what kind of 
      compression?)
   - When I have need for data older than the time period, I could use 
   Curator to restore it to the ES cluster, or even a different ES cluster. 
      - After that I could do what I needed, before deleting it again.
   

I'd test all this myself, but I don't have the resources for a decent test 
environment yet. :( Still working on that. 

Am I missing anything? Are there better ways to keep from running out of 
storage space? Any general advice related to this kind of thing?

Thanks in advance!

[1] 
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/modules-snapshots.html
https://github.com/elasticsearch/curator/wiki
http://www.elasticsearch.org/blog/elasticsearch-curator-version-1-1-0-released/

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/8957c038-f6d7-47d9-8225-5a975454aa54%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to