Re: 3,000 events/sec Architecture

Zachary Lammers Tue, 04 Mar 2014 12:08:32 -0800

My initial suggestion would be to set your templates to 3 shards, 1 
replica.  With three data nodes, you'd have two shards per index, at 5 
indexes/day, that's 10 shards per day per index per node.  3 nodes/10 
shards per day/30 days is 900 shards.  I don't know any 'cutoff' per se, 
but 900 may be a bit much for ~10g instance, but I've run 1500+ shards on 
16g instances.


I set my shards/replicas via template to match my auto-index-naming 
starting with year (20* matching), though you can do it via your YML config 
as well.

{
  "template" : "20*",
  "settings" : {
      "index.number_of_shards" : 18,
      "index.number_of_replicas" : 1,
      "index.auto_expand_replicas" : false
  },
  "mappings" : {
    "_default_" : {
      "_source" : { "compress" : false },
      "properties" : {
        "priority"             : { "type" : "string", "index" : 
"not_analyzed" },
        "facility"             : { "type" : "string", "index" : 
"not_analyzed" },

 

...and so on.


Default without any settings is 5 shards/1 replica per index, which 
wouldn't distribute evenly across 3 data nodes.  It will balance out over 
multiple days though.  That's not necessarily a bad thing, as more cpus can 
search faster, but the more shards, more ram used, etc.  

I currently have a one dedicated master node and one dedicated search node. 
 In a prod environment, I'd have a small group of virtual masters (3-5?), 
but probably only the one virtual search node (we do *far* more indexing 
that searching).  Depending on how much searching, you may not need a 
dedicated search node, you can just hit any node on 9200, or do a 
search/master combo dedicated, or...really lots of ways, this is where I'm 
weak though, not sure how to estimate needs, as I don't have my environment 
mapped out!

Are some of your indexes much larger that others per day?  If so, I believe 
nodes are balanced by shard, not by shard disk usage -- so a much smaller 
shard is the same for ES 'capacity planning' as a larger one.  Unless this 
changed recently in 1.0.x ?

-Zachary

On Tuesday, March 4, 2014 9:51:47 AM UTC-6, Eric wrote:
>
> Zach,
>
> Thanks for the information. With my POC, I have 2 10 gig VMs and I'm 
> keeping 7 days of logs with no issues but that is a fairly large jump and I 
> could see where it may pose an issue. 
>
> As far as the 150 indexes, I'm not sure on the shards per index/replicas. 
> That is the part that I'm the weakest on in ES setup. I'm not exactly sure 
> how I should set up the ES cluster as far as the shards, replicas, master 
> node, data node, search node etc.
>
> I fully agree with the logstash directly to ES. I have 1 logstash instance 
> right now failing 5 files and directly feeding in to ES and I've enjoyed 
> not having another application to have to worry about.
>
> Eric
>
>
> On Tuesday, March 4, 2014 10:32:26 AM UTC-5, Zachary Lammers wrote:
>>
>> Based on my experience, I think you may have an issue with OOM trying to 
>> keep a month of logs with ~10gb ram / server.
>>
>> Say, for instance, 5 indexes a day for 30 days = 150 indexes.  How many 
>> shards per index/replicas?
>>
>> I ran some tests with 8GB assigned to my 20x ES data nodes, and after a 
>> ~7 days of single index per day of all log data, my cluster would crash due 
>> to data nodes going OOM.  I know I can't perfectly compare, and I'm someone 
>> new to ES myself, but as soon as I removed the 'older' servers from the 
>> cluster that had smaller ram, and gave ES 16GB for each data node, I've not 
>> gone OOM since.  I was working with higher data rates, but I'm not sure the 
>> volume mattered as much as my shard count per index per node.
>>
>> For reference, my current lab config is 36 data nodes, running single 
>> index per day (18 shards/1 replica), and I can index near 40,000 per second 
>> at beginning of day, closer to 30,000 per second near end of day when index 
>> is much larger.  I used to run 36 shards/1 replica, but I wanted the 
>> shards/index/per node to be minimal, as I'd really like to keep 60 days 
>> (except I'm running out of disk space on my old servers first!)  To pipe 
>> the data in, I'm running 45 separate logstash instances, each monitoring a 
>> single FIFO that I have scripts simply catting data into.  Eash LS instance 
>> is joining the ES cluster (no redis/etc, I've had too many issues not going 
>> direct to ES).  I recently started over after keeping steady with 25B log 
>> events over ~12 days (but ran out of disk so had to delete old indexes).  I 
>> tried updating to LS1.4b2/ES1.0.1, but it failed miserably, LS1.4b2 was 
>> extremely, extremely slow in indexing, so I'm still LS 1.3.3 and ES0.90.9.
>>
>> As for master question, I can't answer.  I'm only running one right now 
>> for this lab cluster, which I know is not recommended, but I have zero idea 
>> how many I should truly have.  Like I said, I'm new to this :)
>>
>> -Zachary
>>
>> On Tuesday, March 4, 2014 9:11:59 AM UTC-6, Eric Luellen wrote:
>>>
>>> Hello,
>>>
>>> I've been working on a POC for Logstash/ElasticSearch/Kibana for about 2 
>>> months now and everything has worked out pretty good and we are ready to 
>>> move it to production. Before building out the infrastructure, I want to 
>>> make sure my shard/node/index setup is correct as that is the main part 
>>> that I'm still a bit fuzzy on. Overall my setup is this:
>>>
>>> Servers
>>> Networking Gear                                                         
>>>                         syslog-ng server
>>> End Points               ----------------->       Load Balancer     
>>>  ------------>       syslog-ng server              -------------->     Logs 
>>> stored in 5 flat files on SAN storage
>>> Security Devices                                                         
>>>                         syslog-ng server
>>> Etc.
>>>
>>> I have logstash running on one of the syslog-ng servers and is basically 
>>> reading the input of 5 different files and sending them to ElasticSearch. 
>>> So within ElasticSearch, I am creating 5 different indexes a day so I can 
>>> do granular user access control within Kibana.
>>>
>>> unix-$date
>>> windows-$date
>>> networking-$date
>>> security-$date
>>> endpoint-$date
>>>
>>> My plan is to have 3 ElasticSearch servers with ~10 gig of RAM each on 
>>> them. For my POC I have 2 and it's working fine for 2,000 events/second. My 
>>> main concern is how I setup the ElasticSearch servers so they are as 
>>> efficient as possible. With my 5 different indexes a day, and I plan on 
>>> keeping ~1 month of logs within ES, is 3 servers enough? Should I have 1 
>>> master node and the other 2 be just basic setups that are data and 
>>> searching? Also, will 1 replica be sufficient for this setup or should I do 
>>> 2 to be safe? In my POC, I've had a few issues where I ran out of memory or 
>>> something weird happened and I lost data for a while so wanted to try to 
>>> limit that as much as possible. We'll also have quite a few users 
>>> potentially querying the system so I didn't know if I should setup a 
>>> dedicated search node for one of these.
>>>
>>> Besides the ES cluster, I think everything else should be fine. I have 
>>> had a few concerns about logstash keeping up with the amount of entries 
>>> coming into syslog-ng but haven't seen much in the way of load balancing 
>>> logstash or verifying if it's able to keep up or not. I've spot checked the 
>>> files quite a bit and everything seems to be correct but if there is a 
>>> better way to do this, I'm all ears.
>>>
>>> I'm going to have my KIbana instance installed on the master ES node, 
>>> which shouldn't be a big deal. I've played with the idea of putting the ES 
>>> servers on the syslog-ng servers and just have a separate NIC for the ES 
>>> traffic but didn't want to bog down the servers a whole lot. 
>>>
>>> Any thoughts or recommendations would be greatly appreciated.
>>>
>>> Thanks,
>>> Eric
>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/eabe2830-f1bc-4e38-8d01-4cca1dad28b9%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: 3,000 events/sec Architecture

Reply via email to