Re: Healthchecks with many nbprocs

Pavlos Parissis Mon, 20 Jun 2016 06:27:02 -0700

On 20/06/2016 04:44 πμ, Daniel Ylitalo wrote:
> Hi!
> 
> I haven't found anything about this topic anywhere so I was hoping someone in
> the mailinglist has done this in the past :)
> 
> We are at the size where we need to round-robin tcp balance our incoming web
> traffic with pf to two haproxy servers both running with nbproc 28 for http 
> load
> balancing, however, this leads to 56 healthchecks being done each second 
> against
> our web nodes which hammers them quite hard.
>


Shall I assume you are using HTTPS as well? As 28 processes could be way too
many for HTTP, unless you have 40GbE links but even then it is too much.

> How exactly are you guys solving this issue? Because at this size, the
> healthchecks kind of starts eating more cpu than they are helpful.
> 

Well, several things can be done:

- Centralize the state of check and offload the execution on the target node

Several companies use Zookeeper or Consul to store the state of the health check
and run a daemon on backend servers which performs the actual health checking
and update the state on zookeeper or Consul. They run another daemon on HAProxy
servers which reacts when the state of server change and either enable or
disable the server on HAProxy via stats socket and update configuration file as
well. The latter isn't needed anymore as HAProxy preserves the state of server
across reloads with
http://cbonte.github.io/haproxy-dconv/configuration-1.6.html#3.1-server-state-file

In this setup HAProxy only performs a TCP check in order to cover the case of
network partition, HAProxy can't reach servers while the whole chain for health
checking works

Another disadvantage of running health checks on HAProxy when nbproc >1 is that
these processes don't always agree on the status. This problem becomes bigger
when you have more than HAProxy servers. All these processes are acting as
different brains and they never agree on something at a specific time.

Centralizing the store of the state and have only 1 brain (daemon on backend
server) doing the check avoids this problem.

- split the frontends to HTTPS and HTTP

If HTTPS is not mandatory on the backend then you can have 20 processes which
handle HTTPS traffic and offload to 2 processes which forward traffic to backend
servers in clear text.

- Disable hyper-threading on Intel

We disabled it on our servers and we went from 22 processes to 10, which dropped
capacity by ~8%.

Hope it helps,
Pavlos

signature.asc
Description: OpenPGP digital signature

Re: Healthchecks with many nbprocs

Reply via email to