My initial suggestion would be to set your templates to 3 shards, 1 replica. With three data nodes, you'd have two shards per index, at 5 indexes/day, that's 10 shards per day per index per node. 3 nodes/10 shards per day/30 days is 900 shards. I don't know any 'cutoff' per se, but 900 may be a bit much for ~10g instance, but I've run 1500+ shards on 16g instances.
I set my shards/replicas via template to match my auto-index-naming starting with year (20* matching), though you can do it via your YML config as well. { "template" : "20*", "settings" : { "index.number_of_shards" : 18, "index.number_of_replicas" : 1, "index.auto_expand_replicas" : false }, "mappings" : { "_default_" : { "_source" : { "compress" : false }, "properties" : { "priority" : { "type" : "string", "index" : "not_analyzed" }, "facility" : { "type" : "string", "index" : "not_analyzed" }, ...and so on. Default without any settings is 5 shards/1 replica per index, which wouldn't distribute evenly across 3 data nodes. It will balance out over multiple days though. That's not necessarily a bad thing, as more cpus can search faster, but the more shards, more ram used, etc. I currently have a one dedicated master node and one dedicated search node. In a prod environment, I'd have a small group of virtual masters (3-5?), but probably only the one virtual search node (we do *far* more indexing that searching). Depending on how much searching, you may not need a dedicated search node, you can just hit any node on 9200, or do a search/master combo dedicated, or...really lots of ways, this is where I'm weak though, not sure how to estimate needs, as I don't have my environment mapped out! Are some of your indexes much larger that others per day? If so, I believe nodes are balanced by shard, not by shard disk usage -- so a much smaller shard is the same for ES 'capacity planning' as a larger one. Unless this changed recently in 1.0.x ? -Zachary On Tuesday, March 4, 2014 9:51:47 AM UTC-6, Eric wrote: > > Zach, > > Thanks for the information. With my POC, I have 2 10 gig VMs and I'm > keeping 7 days of logs with no issues but that is a fairly large jump and I > could see where it may pose an issue. > > As far as the 150 indexes, I'm not sure on the shards per index/replicas. > That is the part that I'm the weakest on in ES setup. I'm not exactly sure > how I should set up the ES cluster as far as the shards, replicas, master > node, data node, search node etc. > > I fully agree with the logstash directly to ES. I have 1 logstash instance > right now failing 5 files and directly feeding in to ES and I've enjoyed > not having another application to have to worry about. > > Eric > > > On Tuesday, March 4, 2014 10:32:26 AM UTC-5, Zachary Lammers wrote: >> >> Based on my experience, I think you may have an issue with OOM trying to >> keep a month of logs with ~10gb ram / server. >> >> Say, for instance, 5 indexes a day for 30 days = 150 indexes. How many >> shards per index/replicas? >> >> I ran some tests with 8GB assigned to my 20x ES data nodes, and after a >> ~7 days of single index per day of all log data, my cluster would crash due >> to data nodes going OOM. I know I can't perfectly compare, and I'm someone >> new to ES myself, but as soon as I removed the 'older' servers from the >> cluster that had smaller ram, and gave ES 16GB for each data node, I've not >> gone OOM since. I was working with higher data rates, but I'm not sure the >> volume mattered as much as my shard count per index per node. >> >> For reference, my current lab config is 36 data nodes, running single >> index per day (18 shards/1 replica), and I can index near 40,000 per second >> at beginning of day, closer to 30,000 per second near end of day when index >> is much larger. I used to run 36 shards/1 replica, but I wanted the >> shards/index/per node to be minimal, as I'd really like to keep 60 days >> (except I'm running out of disk space on my old servers first!) To pipe >> the data in, I'm running 45 separate logstash instances, each monitoring a >> single FIFO that I have scripts simply catting data into. Eash LS instance >> is joining the ES cluster (no redis/etc, I've had too many issues not going >> direct to ES). I recently started over after keeping steady with 25B log >> events over ~12 days (but ran out of disk so had to delete old indexes). I >> tried updating to LS1.4b2/ES1.0.1, but it failed miserably, LS1.4b2 was >> extremely, extremely slow in indexing, so I'm still LS 1.3.3 and ES0.90.9. >> >> As for master question, I can't answer. I'm only running one right now >> for this lab cluster, which I know is not recommended, but I have zero idea >> how many I should truly have. Like I said, I'm new to this :) >> >> -Zachary >> >> On Tuesday, March 4, 2014 9:11:59 AM UTC-6, Eric Luellen wrote: >>> >>> Hello, >>> >>> I've been working on a POC for Logstash/ElasticSearch/Kibana for about 2 >>> months now and everything has worked out pretty good and we are ready to >>> move it to production. Before building out the infrastructure, I want to >>> make sure my shard/node/index setup is correct as that is the main part >>> that I'm still a bit fuzzy on. Overall my setup is this: >>> >>> Servers >>> Networking Gear >>> syslog-ng server >>> End Points -----------------> Load Balancer >>> ------------> syslog-ng server --------------> Logs >>> stored in 5 flat files on SAN storage >>> Security Devices >>> syslog-ng server >>> Etc. >>> >>> I have logstash running on one of the syslog-ng servers and is basically >>> reading the input of 5 different files and sending them to ElasticSearch. >>> So within ElasticSearch, I am creating 5 different indexes a day so I can >>> do granular user access control within Kibana. >>> >>> unix-$date >>> windows-$date >>> networking-$date >>> security-$date >>> endpoint-$date >>> >>> My plan is to have 3 ElasticSearch servers with ~10 gig of RAM each on >>> them. For my POC I have 2 and it's working fine for 2,000 events/second. My >>> main concern is how I setup the ElasticSearch servers so they are as >>> efficient as possible. With my 5 different indexes a day, and I plan on >>> keeping ~1 month of logs within ES, is 3 servers enough? Should I have 1 >>> master node and the other 2 be just basic setups that are data and >>> searching? Also, will 1 replica be sufficient for this setup or should I do >>> 2 to be safe? In my POC, I've had a few issues where I ran out of memory or >>> something weird happened and I lost data for a while so wanted to try to >>> limit that as much as possible. We'll also have quite a few users >>> potentially querying the system so I didn't know if I should setup a >>> dedicated search node for one of these. >>> >>> Besides the ES cluster, I think everything else should be fine. I have >>> had a few concerns about logstash keeping up with the amount of entries >>> coming into syslog-ng but haven't seen much in the way of load balancing >>> logstash or verifying if it's able to keep up or not. I've spot checked the >>> files quite a bit and everything seems to be correct but if there is a >>> better way to do this, I'm all ears. >>> >>> I'm going to have my KIbana instance installed on the master ES node, >>> which shouldn't be a big deal. I've played with the idea of putting the ES >>> servers on the syslog-ng servers and just have a separate NIC for the ES >>> traffic but didn't want to bog down the servers a whole lot. >>> >>> Any thoughts or recommendations would be greatly appreciated. >>> >>> Thanks, >>> Eric >>> >>> -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/eabe2830-f1bc-4e38-8d01-4cca1dad28b9%40googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.