Hi Cyril, On Wed, Jan 06, 2010 at 08:58:17PM +0100, Cyril Bonté wrote: > Le Mardi 5 Janvier 2010 23:42:46, Willy Tarreau a écrit : > > On Tue, Jan 05, 2010 at 11:14:32PM +0100, Cyril Bonté wrote: > > > Well, eventually after several different tests, that's OK for me. > > > A short http-request timeout (some seconds max) will prevent the > > > accumulation of connections "ESTABLISHED" in the haproxy->client side > > > (which use sessions in haproxy that will never read anything) but > > > inexistant in the client->haproxy side. > > > > indeed, and against this you can also use "option abortonclose" which > > will simply abort the requests if their input channel is empty before > > a connection is established. > > Sadly not, for this specific case. After 20 hours (no timeout were set), the > connection remained established.
This morning I found one good reason for those issues. I cannot reproduce them on the lab but they slowly accumulate on one of the two prod servers (about 10-20 per day). The cause lies in the way the analysers are re-enabled to parse a second request. My assumption was that once enabling one analyser from another one, it would automatically be called, but that's not the case if the target one was already called in the same round. And unfortunately, by the time it is enabled, there is no more I/O on the socket so it never gets woken up again. So I have started to see how to correctly call an analyser once, then only if it is re-enabled by another analyser. Now I think I have found the right logics for this, I just need to run it by hand first to ensure it's ok, then implement. Additionally, I have noticed some dangerous changes on the BF_DONT_READ flag under some circumstances, which sometimes disable any further reading on a socket. That combined with the issue above can theorically definitely freeze a socket. I have looked at the sessions status by connecting to the stats socket, here's what I got : > show sess 0x816f068: proto=tcpv4 src=XX.XX.XXX.XXX:44578 fe=public be=public srv=<none> ts=08 age=11h21m calls=12 rq[f=1501000h,l=783,an=0eh,rx=,wx=,ax=] rp[f=2001000h,l=0,an=00h,rx=,wx=,ax=] s0=[7,18h,fd=2,ex=] s1=[0,0h,fd=-1,ex=] exp= rq.f = 1501000h => 1=BF_DONT_READ rq.an = 0eh => 3 analysers including http_wait_request(), which would have cleared BF_DONT_READ if it were called again. So I have also made the rules to use BF_DONT_READ stricter so that we can't leave it set when leaving an analyser. I already have the patch for that, but without the former fix it will not bring anything, so I want to fix the other one first, and will keep you updated. I will not release -dev6 until it can run for one day on both prod servers without leaving *any* stuck session, otherwise that's plain unacceptable. And I want to fix the issues at their root, not just the symptoms. Thanks for your tests and feedback, Willy