Re: 3,000 events/sec Architecture

Mark Walkom Tue, 04 Mar 2014 12:21:06 -0800

You also definitely want an odd number of nodes to prevent potential split
brain situations.


Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com


On 5 March 2014 07:08, Zachary Lammers <zlamm...@gmail.com> wrote:

> My initial suggestion would be to set your templates to 3 shards, 1
> replica.  With three data nodes, you'd have two shards per index, at 5
> indexes/day, that's 10 shards per day per index per node.  3 nodes/10
> shards per day/30 days is 900 shards.  I don't know any 'cutoff' per se,
> but 900 may be a bit much for ~10g instance, but I've run 1500+ shards on
> 16g instances.
>
> I set my shards/replicas via template to match my auto-index-naming
> starting with year (20* matching), though you can do it via your YML config
> as well.
>
> {
>   "template" : "20*",
>   "settings" : {
>       "index.number_of_shards" : 18,
>       "index.number_of_replicas" : 1,
>       "index.auto_expand_replicas" : false
>   },
>   "mappings" : {
>     "_default_" : {
>       "_source" : { "compress" : false },
>       "properties" : {
>         "priority"             : { "type" : "string", "index" :
> "not_analyzed" },
>         "facility"             : { "type" : "string", "index" :
> "not_analyzed" },
>
>
>
> ...and so on.
>
>
> Default without any settings is 5 shards/1 replica per index, which
> wouldn't distribute evenly across 3 data nodes.  It will balance out over
> multiple days though.  That's not necessarily a bad thing, as more cpus can
> search faster, but the more shards, more ram used, etc.
>
> I currently have a one dedicated master node and one dedicated search
> node.  In a prod environment, I'd have a small group of virtual masters
> (3-5?), but probably only the one virtual search node (we do *far* more
> indexing that searching).  Depending on how much searching, you may not
> need a dedicated search node, you can just hit any node on 9200, or do a
> search/master combo dedicated, or...really lots of ways, this is where I'm
> weak though, not sure how to estimate needs, as I don't have my environment
> mapped out!
>
> Are some of your indexes much larger that others per day?  If so, I
> believe nodes are balanced by shard, not by shard disk usage -- so a much
> smaller shard is the same for ES 'capacity planning' as a larger one.
>  Unless this changed recently in 1.0.x ?
>
> -Zachary
>
> On Tuesday, March 4, 2014 9:51:47 AM UTC-6, Eric wrote:
>>
>> Zach,
>>
>> Thanks for the information. With my POC, I have 2 10 gig VMs and I'm
>> keeping 7 days of logs with no issues but that is a fairly large jump and I
>> could see where it may pose an issue.
>>
>> As far as the 150 indexes, I'm not sure on the shards per index/replicas.
>> That is the part that I'm the weakest on in ES setup. I'm not exactly sure
>> how I should set up the ES cluster as far as the shards, replicas, master
>> node, data node, search node etc.
>>
>> I fully agree with the logstash directly to ES. I have 1 logstash
>> instance right now failing 5 files and directly feeding in to ES and I've
>> enjoyed not having another application to have to worry about.
>>
>> Eric
>>
>>
>> On Tuesday, March 4, 2014 10:32:26 AM UTC-5, Zachary Lammers wrote:
>>>
>>> Based on my experience, I think you may have an issue with OOM trying to
>>> keep a month of logs with ~10gb ram / server.
>>>
>>> Say, for instance, 5 indexes a day for 30 days = 150 indexes.  How many
>>> shards per index/replicas?
>>>
>>> I ran some tests with 8GB assigned to my 20x ES data nodes, and after a
>>> ~7 days of single index per day of all log data, my cluster would crash due
>>> to data nodes going OOM.  I know I can't perfectly compare, and I'm someone
>>> new to ES myself, but as soon as I removed the 'older' servers from the
>>> cluster that had smaller ram, and gave ES 16GB for each data node, I've not
>>> gone OOM since.  I was working with higher data rates, but I'm not sure the
>>> volume mattered as much as my shard count per index per node.
>>>
>>> For reference, my current lab config is 36 data nodes, running single
>>> index per day (18 shards/1 replica), and I can index near 40,000 per second
>>> at beginning of day, closer to 30,000 per second near end of day when index
>>> is much larger.  I used to run 36 shards/1 replica, but I wanted the
>>> shards/index/per node to be minimal, as I'd really like to keep 60 days
>>> (except I'm running out of disk space on my old servers first!)  To pipe
>>> the data in, I'm running 45 separate logstash instances, each monitoring a
>>> single FIFO that I have scripts simply catting data into.  Eash LS instance
>>> is joining the ES cluster (no redis/etc, I've had too many issues not going
>>> direct to ES).  I recently started over after keeping steady with 25B log
>>> events over ~12 days (but ran out of disk so had to delete old indexes).  I
>>> tried updating to LS1.4b2/ES1.0.1, but it failed miserably, LS1.4b2 was
>>> extremely, extremely slow in indexing, so I'm still LS 1.3.3 and ES0.90.9.
>>>
>>> As for master question, I can't answer.  I'm only running one right now
>>> for this lab cluster, which I know is not recommended, but I have zero idea
>>> how many I should truly have.  Like I said, I'm new to this :)
>>>
>>> -Zachary
>>>
>>> On Tuesday, March 4, 2014 9:11:59 AM UTC-6, Eric Luellen wrote:
>>>>
>>>> Hello,
>>>>
>>>> I've been working on a POC for Logstash/ElasticSearch/Kibana for about
>>>> 2 months now and everything has worked out pretty good and we are ready to
>>>> move it to production. Before building out the infrastructure, I want to
>>>> make sure my shard/node/index setup is correct as that is the main part
>>>> that I'm still a bit fuzzy on. Overall my setup is this:
>>>>
>>>> Servers
>>>> Networking Gear
>>>>                         syslog-ng server
>>>> End Points               ----------------->       Load Balancer
>>>>  ------------>       syslog-ng server              -------------->     Logs
>>>> stored in 5 flat files on SAN storage
>>>> Security Devices
>>>>                           syslog-ng server
>>>> Etc.
>>>>
>>>> I have logstash running on one of the syslog-ng servers and is
>>>> basically reading the input of 5 different files and sending them to
>>>> ElasticSearch. So within ElasticSearch, I am creating 5 different indexes a
>>>> day so I can do granular user access control within Kibana.
>>>>
>>>> unix-$date
>>>> windows-$date
>>>> networking-$date
>>>> security-$date
>>>> endpoint-$date
>>>>
>>>> My plan is to have 3 ElasticSearch servers with ~10 gig of RAM each on
>>>> them. For my POC I have 2 and it's working fine for 2,000 events/second. My
>>>> main concern is how I setup the ElasticSearch servers so they are as
>>>> efficient as possible. With my 5 different indexes a day, and I plan on
>>>> keeping ~1 month of logs within ES, is 3 servers enough? Should I have 1
>>>> master node and the other 2 be just basic setups that are data and
>>>> searching? Also, will 1 replica be sufficient for this setup or should I do
>>>> 2 to be safe? In my POC, I've had a few issues where I ran out of memory or
>>>> something weird happened and I lost data for a while so wanted to try to
>>>> limit that as much as possible. We'll also have quite a few users
>>>> potentially querying the system so I didn't know if I should setup a
>>>> dedicated search node for one of these.
>>>>
>>>> Besides the ES cluster, I think everything else should be fine. I have
>>>> had a few concerns about logstash keeping up with the amount of entries
>>>> coming into syslog-ng but haven't seen much in the way of load balancing
>>>> logstash or verifying if it's able to keep up or not. I've spot checked the
>>>> files quite a bit and everything seems to be correct but if there is a
>>>> better way to do this, I'm all ears.
>>>>
>>>> I'm going to have my KIbana instance installed on the master ES node,
>>>> which shouldn't be a big deal. I've played with the idea of putting the ES
>>>> servers on the syslog-ng servers and just have a separate NIC for the ES
>>>> traffic but didn't want to bog down the servers a whole lot.
>>>>
>>>> Any thoughts or recommendations would be greatly appreciated.
>>>>
>>>> Thanks,
>>>> Eric
>>>>
>>>>  --
> You received this message because you are subscribed to the Google Groups
> "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to elasticsearch+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/elasticsearch/eabe2830-f1bc-4e38-8d01-4cca1dad28b9%40googlegroups.com<https://groups.google.com/d/msgid/elasticsearch/eabe2830-f1bc-4e38-8d01-4cca1dad28b9%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y69xTLT2Msb12Q-HjLSWExAz8GN7Cw_VNjS8pBvequdA%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Re: 3,000 events/sec Architecture

Reply via email to