Re: Recommendations needed for large ELK system design

Alex Fri, 01 Aug 2014 07:25:44 -0700

Ok thank you Mark, you've been extremely helpful and we now have a better 
idea about what we're doing!


-Alex

On Thursday, 31 July 2014 23:57:26 UTC+1, Mark Walkom wrote:
>
> 1 - Curator FTW.
> 2 - Masters handle cluster state, shard allocation and a whole bunch of 
> other stuff around managing the cluster and it's members and data. A node 
> that is master and data set to false is considered a search node. But the 
> role of being a master is not onerous, so it made sense for us to double up 
> the roles. We then just round robin any queries to these three masters.
> 3 - Yes, but....it's entirely dependent on your environment. If you're 
> happy with that and you can get the go-ahead then see where it takes you.
> 4 - Quorum is automatic and having the n/2+1 means that the majority of 
> nodes will have to take place in an election, which reduces the possibility 
> of split brain. If you set the discovery settings then you are also 
> essentially setting the quorum settings.
>
> Regards,
> Mark Walkom
>
> Infrastructure Engineer
> Campaign Monitor
> email: ma...@campaignmonitor.com <javascript:>
> web: www.campaignmonitor.com
>
>
> On 31 July 2014 22:27, Alex <alex....@gmail.com <javascript:>> wrote:
>
>> Hello Mark,
>>
>> Thank you for your reply, it certainly helps to clarify many things.
>>
>> Of course I have some new questions for you!
>>
>>    1. I haven't looked into it much yet but I'm guessing Curator can 
>>    handle different index naming schemes. E.g. logs-2014.06.30 and 
>>    stats-2014.06.30. We'd actually be wanting to store the stats data for 2 
>>    years and logs for 90 days so it would indeed be helpful to split the 
>> data 
>>    into different index sets. Do you use Curator?
>>    
>>    2. You say that you have "3 masters that also handle queries"... but 
>>    I thought all masters did was handle queries? What is a master node that 
>>    *doesn't* handle queries? Should we have search load balancer nodes? 
>>    AKA not master and not data nodes.
>>    
>>    3. In the interests of reducing the number of node combinations for 
>>    us to test out would you say, then, that 3 master (and query(??)) only 
>>    nodes, and the 6 1TB data only nodes would be good?
>>    
>>    4. Quorum and split brain are new to me. This webpage 
>>    
>> <http://blog.trifork.com/2013/10/24/how-to-avoid-the-split-brain-problem-in-elasticsearch/>
>>  about 
>>    split brain recommends setting *discovery.zen.minimum_master_nodes* equal 
>>    to *N/2 + 1*. This formula is similar to the one given in the 
>>    documentation for quorum 
>>    
>> <http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-index_.html#index-consistency>:
>>  
>>    "index operations only succeed if a quorum (>replicas/2+1) of active 
>> shards 
>>    are available". I completely understand the split brain issue, but not 
>>    quorum. Is quorum handled automatically or should I change some settings? 
>>
>> Thanks again for your help, we appreciate your time and knowledge!
>> Regards,
>> Alex
>>
>>
>> On Thursday, 31 July 2014 05:57:35 UTC+1, Mark Walkom wrote:
>>
>>> 1 - Looks ok, but why two replicas? You're chewing up disk for what 
>>> reason? Extra comments below.
>>> 2 - It's personal preference really and depends on how your end points 
>>> send to redis.
>>> 3 - 4GB for redis will cache quite a lot of data if you're only doing 50 
>>> events p/s (ie hours or even days based on what I've seen).
>>> 4 - No, spread it out to all the nodes. More on that below though.
>>> 5 - No it will handle that itself. Again, more on that below though.
>>>
>>> Suggestions;
>>> Set your indexes to (factors of) 6 shards, ie one per node, it spreads 
>>> query performance. I say "factors of" in that you can set it to 12 shards 
>>> per index to start and easily scale the node count and still spread the 
>>> load.
>>> Split your stats and your log data into different indexes, it'll make 
>>> management and retention easier.
>>> You can consider a master only node or (ideally) three that also handle 
>>> queries.
>>> Preferably have an uneven number of master eligible nodes, whether you 
>>> make them VMs or physicals, that way you can ensure quorum is reached with 
>>> minimal fuss and stop split brain.
>>> If you use VMs for master + query nodes then you might want to look at 
>>> load balancing the queries via an external service.
>>>
>>> To give you an idea, we have a 27 node cluster - 3 masters that also 
>>> handle queries and 24 data nodes. Masters are 8GB with small disks, data 
>>> nodes are 60GB (30 heap) and 512GB disk.
>>> We're running with one replica and have 11TB of logging data. At a high 
>>> level we're running out of disk more than heap or CPU and we're very write 
>>> heavy, with an average of 1K events p/s and comparatively minimal reads.
>>>
>>> Regards,
>>> Mark Walkom
>>>
>>> Infrastructure Engineer
>>> Campaign Monitor
>>> email: ma...@campaignmonitor.com
>>> web: www.campaignmonitor.com
>>>
>>>
>>> On 31 July 2014 01:35, Alex <alex....@gmail.com> wrote:
>>>
>>>>  Hello,
>>>>
>>>> We wish to set up an entire ELK system with the following features:
>>>>
>>>>    - Input from Logstash shippers located on 400 Linux VMs. Only a 
>>>>    handful of log sources on each VM. 
>>>>    - Data retention for 30 days, which is roughly 2TB of data in 
>>>>    indexed ES JSON form (not including replica shards)
>>>>    - Estimated input data rate of 50 messages per second at peak 
>>>>    hours. Mostly short or medium length one-line messages but there will 
>>>> be 
>>>>    Java traces and very large service responses (in the form of XML) to 
>>>> deal 
>>>>    with too. 
>>>>    - The entire system would be on our company LAN.
>>>>    - The stored data will be a mix of application logs (info, errors 
>>>>    etc) and server stats (CPU, memory usage etc) and would mostly be 
>>>> accessed 
>>>>    through Kibana. 
>>>>
>>>> This is our current plan:
>>>>
>>>>    - Have the LS shippers perform minimal parsing (but would do 
>>>>    multiline). Have them point to two load-balanced servers containing 
>>>> Redis 
>>>>    and LS indexers (which would do all parsing). 
>>>>    - 2 replica shards for each index, which ramps the total data 
>>>>    storage up to 6TB
>>>>    - ES cluster spread over 6 nodes. Each node is 1TB in size 
>>>>    - LS indexers pointing to cluster.
>>>>
>>>> So I have a couple questions regarding the setup and would greatly 
>>>> appreciate the advice of someone with experience!
>>>>
>>>>    1. Does the balance between the number of nodes, the number of 
>>>>    replica shards, and storage size of each node seem about right? We use 
>>>>    high-performance equipment and would expect minimal downtime.
>>>>    
>>>>    2. What is your recommendation for the system design of the LS 
>>>>    indexers and Redis? I've seen various designs with each indexer 
>>>> assigned to 
>>>>    a single Redis, or all indexers reading from all Redises.
>>>>    
>>>>    3. Leading from the previous question, what would your recommend 
>>>>    data size for the Redis servers be?
>>>>    
>>>>    4. Not sure what to do about master/data nodes. Assuming all the 
>>>>    nodes are on identical hardware would it be beneficial to have a node 
>>>> which 
>>>>    is only a master which would only handle requests?
>>>>    
>>>>    5. Do we need to do any additional load balancing on the ES nodes?
>>>>
>>>> We are open to any and all suggestions. We have not yet committed to 
>>>> any particular design so can change if needed.
>>>>
>>>> Thank you for your time and responses,
>>>> Alex
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "elasticsearch" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to elasticsearc...@googlegroups.com.
>>>>
>>>> To view this discussion on the web visit https://groups.google.com/d/
>>>> msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%
>>>> 40googlegroups.com 
>>>> <https://groups.google.com/d/msgid/elasticsearch/b0aee66a-35bb-4770-927b-d9c7e13ad9fc%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "elasticsearch" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to elasticsearc...@googlegroups.com <javascript:>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/elasticsearch/78f19ec6-4a52-4570-a758-d3865aab4751%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticsearch+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/elasticsearch/dc9ee1d5-c5c9-436b-a306-758e379fde92%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: Recommendations needed for large ELK system design

Reply via email to