On 01/02/2015 03:03 μμ, Willy Tarreau wrote: > On Sun, Feb 01, 2015 at 08:25:24AM +0100, Pavlos Parissis wrote: >> If I understood Bhaskar's suggestion correctly, we could delegate health >> check for backend servers to a single server which does all the health >> checking. Am I right ? > > Yes that was the idea. > >> If it this is case then the downside of multiple >> health checks when nbproc > 1 is gone! But, I would like to see a >> fail-back mechanism as we have with agent check in case that single >> server is gone. Alternatively, we could have Bhaskar's suggestion >> implemented in the agent check. > > ... or you can use a local proxy which load-balances between multiple > servers. >
Very interesting idea. >> I am re-heating the request of delegate health checks to a central >> service with a fall-back mechanism in place because >> * Reduces checks in setups where you have servers in multiple backends >> * Reduces checks in setups where you have more than 1 HAProxy active >> server(HAProxy servers behind a Layer 4 load balancer - ECMP and etc) >> * Reduces checks when multi-process model is used >> * Reduces CPU stress on firewalls, when they are present between HAProxy >> and backend servers. > > Absolutely. And keeps state across reloads, and ensures that all LBs have > the same view of the service when servers are flapping. > Exactly, another good reason to use this solution. >> This assumes that there are enough resources on the 'health-checker' >> server to sustain huge amount of requests. Which is not a big deal if >> 'health-checker' solution is designed correctly, meaning that backend >> servers push their availability to that 'health-checker' server and etc. >> Furthermore, 'health-checker' server should have a check in place to >> detect backend servers not sending their health status and declare them >> down after a certain period of inactivity. > > We used to work on exactly such a design a few years ago at HAPTech, and > the principle for it was to be a cache for health checks. That provided > all the benefits of what you mentionned above, including a more consistent > state between LBs when servers are flapping. The idea is that each check > result is associated with a maxage and that any check received while the > last result's age has not maxed out would be returned from the cache. It > happens that all the stuff added to health checks since then had complicated > things significantly (eg: capture of last response, send of the local info, > etc). We've more or less abandonned that work by lack of time and need for > a redesign. So I could say that the design is far from being obvious, but > the gains to expect are very important. Also such a checker should be > responsible for notifications, and possibly for aggregating states before > returning composite statuses (that may be one point to reconsider in the > future to limit complexity though). > Well, let's first see if we can get the basic functionality from HAProxy to send health checks to a server. The design and the implementation details of a centralized health-checker solution can be done by combining available solutions(Zookeeper, etc) with some custom parts as well. Cheers, Pavlos
signature.asc
Description: OpenPGP digital signature