Re: [PATCH 5/5] dynamic health check

Willy Tarreau Sun, 23 Dec 2012 23:13:41 -0800

Hi Simon,

CCing Malcolm who posted the specs for the check.

On Mon, Dec 24, 2012 at 10:33:57AM +0900, Simon Horman wrote:
> Support a dynamic health check performed by opening a TCP socket to a
> pre-defined port and reading an ascii string. The string should have one of
> the following forms:
> 
> i. An ascii representation of an positive integer percentage.
>    e.g. "75%"
> 
>    Values in this format will set the wight proportional to the initial
>    weight of a server as configured when haproxy starts.
> 
> ii. The string "drain".
> 
>    This will cause the weight of a server to be set to 0, and thus it will
>    not accept any new connections other than those that are accepted via
>    persistence.
> 
> ii. The string "disable".
> 
>    Put the server into maintenance mode. The server must be re-enabled
>    before any further health checks will be performed.

This is more for Malcolm : I'm realizing that there is no way for the agent
to report a failure. I would love to see a "down" statement here. The first
goal obviously is to immediately stop using a temporary faulty server. One
of the benefits is that a down state raises an alert. Another benefit is that
the reason can be stored, logged and reported on the stats page. For example,
seeing a server marked down with "full length check failed at database"
would be very useful. As you can see, I would like the reason to be the end
of the string. So for example, the response for down would be the string :

    "down File system full"
or
    "down Service not running"

The first word "down" indicates the status, the rest of the string the reason.
It seems that this would be compatible with your protocol, don't you think ?

> A dynmaic helath check may be configued using "option dynamic-chk".
> The use of an alternate check-port, used to obtain dynamic heath check
> information described above as opposed to the port of the service,
> may be useful in conjunction with this option.

I'm realizing that the name "dynamic" might probably not be the most
appropriate as I initially understood it as a modifier for other checks.
For example, when we implement exactly the same thing within an HTTP
header, "dynamic" could be the option combined with "http-chk". After
all, we're relying on a clearly specified agent. Why not call it with
the agent's name (eg: "lb-agent-chk") ?

> +#define PR_O2_FEEDBACK_CHK 0x80000000   /* use a TCP connection to obtain a 
> metric of server health */

Then once we agree on a name, let's have the same one in this option.

Otherwise it looks good to me. I'm about to issue dev16 today (in a few
hours), if we can quickly decide what to do above, I could even include
it there.

Cheers,
Willy

Re: [PATCH 5/5] dynamic health check

Reply via email to