Hi Cyril,

On Wed, Jan 06, 2010 at 08:58:17PM +0100, Cyril Bonté wrote:
> Le Mardi 5 Janvier 2010 23:42:46, Willy Tarreau a écrit :
> > On Tue, Jan 05, 2010 at 11:14:32PM +0100, Cyril Bonté wrote:
> > > Well, eventually after several different tests, that's OK for me.
> > > A short http-request timeout (some seconds max) will prevent the 
> > > accumulation of connections "ESTABLISHED" in the haproxy->client side 
> > > (which use sessions in haproxy that will never read anything) but 
> > > inexistant in the client->haproxy side.
> > 
> > indeed, and against this you can also use "option abortonclose" which
> > will simply abort the requests if their input channel is empty before
> > a connection is established.
> 
> Sadly not, for this specific case. After 20 hours (no timeout were set), the 
> connection remained established.

This morning I found one good reason for those issues. I cannot
reproduce them on the lab but they slowly accumulate on one of
the two prod servers (about 10-20 per day).

The cause lies in the way the analysers are re-enabled to parse
a second request. My assumption was that once enabling one analyser
from another one, it would automatically be called, but that's not
the case if the target one was already called in the same round.
And unfortunately, by the time it is enabled, there is no more I/O
on the socket so it never gets woken up again.

So I have started to see how to correctly call an analyser once,
then only if it is re-enabled by another analyser. Now I think I
have found the right logics for this, I just need to run it by
hand first to ensure it's ok, then implement.

Additionally, I have noticed some dangerous changes on the
BF_DONT_READ flag under some circumstances, which sometimes
disable any further reading on a socket. That combined with
the issue above can theorically definitely freeze a socket.

I have looked at the sessions status by connecting to the
stats socket, here's what I got :

> show sess
0x816f068: proto=tcpv4 src=XX.XX.XXX.XXX:44578 fe=public be=public srv=<none> 
ts=08 age=11h21m calls=12 rq[f=1501000h,l=783,an=0eh,rx=,wx=,ax=] 
rp[f=2001000h,l=0,an=00h,rx=,wx=,ax=] s0=[7,18h,fd=2,ex=] s1=[0,0h,fd=-1,ex=] 
exp=

rq.f = 1501000h => 1=BF_DONT_READ
rq.an = 0eh => 3 analysers including http_wait_request(), which
        would have cleared BF_DONT_READ if it were called again.

So I have also made the rules to use BF_DONT_READ stricter so
that we can't leave it set when leaving an analyser. I already
have the patch for that, but without the former fix it will not
bring anything, so I want to fix the other one first, and will
keep you updated.

I will not release -dev6 until it can run for one day on both
prod servers without leaving *any* stuck session, otherwise
that's plain unacceptable. And I want to fix the issues at
their root, not just the symptoms.

Thanks for your tests and feedback,
Willy


Reply via email to