On 20/06/2016 04:44 πμ, Daniel Ylitalo wrote: > Hi! > > I haven't found anything about this topic anywhere so I was hoping someone in > the mailinglist has done this in the past :) > > We are at the size where we need to round-robin tcp balance our incoming web > traffic with pf to two haproxy servers both running with nbproc 28 for http > load > balancing, however, this leads to 56 healthchecks being done each second > against > our web nodes which hammers them quite hard. >
Shall I assume you are using HTTPS as well? As 28 processes could be way too many for HTTP, unless you have 40GbE links but even then it is too much. > How exactly are you guys solving this issue? Because at this size, the > healthchecks kind of starts eating more cpu than they are helpful. > Well, several things can be done: - Centralize the state of check and offload the execution on the target node Several companies use Zookeeper or Consul to store the state of the health check and run a daemon on backend servers which performs the actual health checking and update the state on zookeeper or Consul. They run another daemon on HAProxy servers which reacts when the state of server change and either enable or disable the server on HAProxy via stats socket and update configuration file as well. The latter isn't needed anymore as HAProxy preserves the state of server across reloads with http://cbonte.github.io/haproxy-dconv/configuration-1.6.html#3.1-server-state-file In this setup HAProxy only performs a TCP check in order to cover the case of network partition, HAProxy can't reach servers while the whole chain for health checking works Another disadvantage of running health checks on HAProxy when nbproc >1 is that these processes don't always agree on the status. This problem becomes bigger when you have more than HAProxy servers. All these processes are acting as different brains and they never agree on something at a specific time. Centralizing the store of the state and have only 1 brain (daemon on backend server) doing the check avoids this problem. - split the frontends to HTTPS and HTTP If HTTPS is not mandatory on the backend then you can have 20 processes which handle HTTPS traffic and offload to 2 processes which forward traffic to backend servers in clear text. - Disable hyper-threading on Intel We disabled it on our servers and we went from 22 processes to 10, which dropped capacity by ~8%. Hope it helps, Pavlos
signature.asc
Description: OpenPGP digital signature