Hi Bryan,
On Wed, Mar 23, 2011 at 09:27:01PM +, Cassidy, Bryan wrote:
> Hi all,
>
> I've noticed an odd (lack of) interaction between "maxconn" and "option
> httpchk"...
>
> If a server's maxconn limit has been reached, it appears that HTTP health
> checks are still dispatched. If I've configured the maxconn limit to match
> the number of requests the backend server can concurrently dispatch, and all
> these connections are busy with slow requests, HAProxy will assume the server
> is down; once the server completes a request, HAProxy waits until "rise"
> health checks have succeeded (as expected if the server was really down, but
> it was only busy). This makes overly busy times even worse.
Yes, that's a known situation. Minconn should always leave some room
for health checks. When you have two haproxies, you might have to leave
at least 2 connections for the health checks. In practice, 1 should be
OK because they're supposed to be fast and it generally is not an issue
if one waits a little bit to get a connection slot.
This issue is sometimes encountered on mongrel servers where only one
connection at a time is possible. The usual workaround for this case
is to set a check timeout larger than what you consider a long request
should be.
Even if that can sound frustrating at first, you have to realize that
if the server is failing to respond to health checks, there is no way
to know whether it's too much busy or if it's dead. So there's nothing
wrong with the current approach. If you pointed your browser to the
server, you'd observe the same behaviour. If you think that you'd tell
the difference because you'd wait longer, then it means you should
adjust your check timeout.
(...)
> I know I can work around this by setting maxconn to one less than a server's
> maximum capacity (perhaps this would be a good idea for other reasons).
Yes that's the way to do it, and it will permit you to connect to the
server without passing through haproxy.
> I suspect I could work around this by using TCP status checks instead of HTTP
> status checks, though I haven't tried this as I like the flexibility HTTP
> health checks offer (like "disable-on-404").
You're right, but relying on TCP only will also not tell you when your
servers are really dead if they're just frozen.
> Is this behavior a bug or a feature? Intuitively I would have expected the
> HTTP health checks to respect maxconn limits, but perhaps there was a
> conscious decision to not do so (for instance, maybe it was considered
> unacceptable for a server's health to be unknown when it is fully loaded).
We have a task on the TODO list to make health checks pass through the queue
and respect the maxconn too. This is especially important for mongrel. But
still, doing so does not cover the situation where you have multiple LBs or
when you need to check the server by yourself.
Regards,
Willy