Why is this wrong?

We do use health-checking, but we can generate a lot of 503s in 2s.  Of course 
the best answer is to fix the app server code so we don’t see this, but the 
developers have been working on this for a while and we’re trying to mitigate 
in the meantime.

More frequent health checks aren’t a good answer in this case.  We do a 
relatively extensive health check, and have enough traffic that we need to 
split the load across multiple HAproxy servers, so more frequent health checks 
end up being a significant load issue.  A simpler health check would be 
inferior and would miss some real-world failure cases we want to check for.  
(At one point we did an HAproxy upgrade which introduced a configuration bug 
which caused us to do 7x the level of health checks and we essentially DoSed 
ourselves.  So we know that a significant increase in health check frequency 
would actually be a problem.)

-Daniel

> On Dec 5, 2014, at 3:09 AM, Lukas Tribus <luky...@hotmail.com> wrote:
> 
> Hi Daniel,
> 
> 
>> We have a situation where our app servers sometimes get into a bad
>> state, and hitting a working server is more important than enforcing
>> persistence. Generally the number of connections to a bad server
>> grows rapidly, so we’ve set a maxconn value on the server line which
>> effectively takes a server out of the pool when the bad state occurs.
>> 
>> If we fill up the connection slots, the server is almost definitely
>> bad, so we’d rather not queue at all. Since maxqueue 0 means unlimited,
>> it looks like the minimum queue size is 1. Is that right? Is there
>> any way to enforce a redispatch whenever we’re at maxconn, without any
>> connections getting queued?
> 
> No, this is not possible, but you really don't want to work in that
> direction, its very wrong.
> 
> You should work on your health checking and implement it in a way
> that HAProxy can detect bad servers. That is the proper way to
> deal with such problems.
> 
> Sometimes, providing a helper script on the backend that detects any
> problems can help with this.
> 
> 
> 
> Regards,
> 
> Lukas                                           


Reply via email to