Storage capacity (planning)?

2014-02-07 Thread Jeroen van Meeuwen (Kolab Systems)

Hi there,

The reason I'm looking at Elastic Search being a totally different one 
^1, I set up a development environment with about 20 servers that use 
rsyslog to send off their logs to a logstash server (input, you guessed 
it, syslog), and through Redis ultimately makes the syslog entries end 
up in Elastic Search. I suppose this is the next-next-finish setup 
documented on [1].


To my surprise, it only takes a day or so to get up to a storage volume 
of ~25 GB in /var/lib/elasticsearch/.


It is particularly surprising to me, because the environment is largely 
idle, other than some monitoring and some cron jobs -- there's not a lot 
of syslog messages compared to a production environment, not at all.


Furthermore, using this rsyslog - logstash collector - redis - 
logstash indexer - elasticsearch setup, I'm seeing the throughput on 
the logical volume for the root filesystem rise continuously -- it's now 
at about 4 MB/s. `iotop` merely suggests this is all Elasticsearch doing 
the I/O, but its payload is on the aforementioned logical volume mounted 
on /var/lib/elasticsearch/.


I'm fairly certain I can tweak the number of log entries being sent off 
to the centralized log server, and it's not unlikely I'm doing something 
wrong, but I was wondering whether anybody out there had gone through 
such exercise before, and whether my expectations are correct.


Thanks, in advance,

Kind regards,

Jeroen van Meeuwen

^1: Kolab Groupware is looking in to developing a singular application 
suite for the topics of Archival, Backup/Restore and e-Discovery. Very 
much a work-in-progress, we're putting down some notes [2] and are doing 
the initial probing at potential storage backend solutions.


[1] http://logstash.net/docs/1.3.3/tutorials/getting-started-centralized
[2] http://docs.kolab.org/architecture-and-design/bonnie.html

--
Systems Architect, Kolab Systems AG

e: vanmeeuwen at kolabsys.com
m: +44 74 2516 3817
w: http://www.kolabsys.com

pgp: 9342 BF08

--
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/2fd3cb3bb2327950a8c1429e85949f3e%40kolabsys.com.
For more options, visit https://groups.google.com/groups/opt_out.


Re: Storage capacity (planning)?

2014-02-07 Thread Binh Ly
Jeroen,

If your objective is to keep the ES storage as minimal as possible, you'd 
probably want to understand first what your search requirements are and 
then optimize the ES indexes accordingly. For example, if you don't need 
replicas, then you can set it to 0. If you don't need the _all field, you 
can disable it (using index templates for example). If you don't need every 
single field from your log event indexed, then you can direct your LS 
filters to only output specific fields that you are interested in. Etc, 
etc...

-- 
You received this message because you are subscribed to the Google Groups 
elasticsearch group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/105b2dac-33aa-44ad-8961-229b3aad4905%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.