On 01/02/2015 03:03 μμ, Willy Tarreau wrote:
> On Sun, Feb 01, 2015 at 08:25:24AM +0100, Pavlos Parissis wrote:
>> If I understood Bhaskar's suggestion correctly, we could delegate health
>> check for backend servers to a single server which does all the health
>> checking. Am I right ?
> 
> Yes that was the idea.
> 
>> If it this is case then the downside of multiple
>> health checks when nbproc > 1 is gone! But, I would like to see a
>> fail-back mechanism as we have with agent check in case that single
>> server is gone. Alternatively, we could have Bhaskar's suggestion
>> implemented in the agent check.
> 
> ... or you can use a local proxy which load-balances between multiple
> servers.
> 

Very interesting idea.

>> I am re-heating the request of delegate health checks to a central
>> service with a fall-back mechanism in place because
>> * Reduces checks in setups where you have servers in multiple backends
>> * Reduces checks in setups where you have more than 1 HAProxy active
>> server(HAProxy servers behind a Layer 4 load balancer - ECMP and etc)
>> * Reduces checks when multi-process model is used
>> * Reduces CPU stress on firewalls, when they are present between HAProxy
>> and backend servers.
> 
> Absolutely. And keeps state across reloads, and ensures that all LBs have
> the same view of the service when servers are flapping.
> 

Exactly, another good reason to use this solution.

>> This assumes that there are enough resources on the 'health-checker'
>> server to sustain huge amount of requests. Which is not a big deal if
>> 'health-checker' solution is designed correctly, meaning that backend
>> servers push their availability to that 'health-checker' server and etc.
>> Furthermore, 'health-checker' server should have a check in place to
>> detect backend servers not sending their health status and declare them
>> down after a certain period of inactivity.
> 
> We used to work on exactly such a design a few years ago at HAPTech, and
> the principle for it was to be a cache for health checks. That provided
> all the benefits of what you mentionned above, including a more consistent
> state between LBs when servers are flapping. The idea is that each check
> result is associated with a maxage and that any check received while the
> last result's age has not maxed out would be returned from the cache. It
> happens that all the stuff added to health checks since then had complicated
> things significantly (eg: capture of last response, send of the local info,
> etc). We've more or less abandonned that work by lack of time and need for
> a redesign. So I could say that the design is far from being obvious, but
> the gains to expect are very important. Also such a checker should be
> responsible for notifications, and possibly for aggregating states before
> returning composite statuses (that may be one point to reconsider in the
> future to limit complexity though).
> 

Well, let's first see if we can get the basic functionality from HAProxy
to send health checks to a server. The design and the implementation
details of a centralized health-checker solution can be done by
combining available solutions(Zookeeper, etc) with some custom parts as
well.

Cheers,
Pavlos

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to