Hi Willy,

I'm back at work after 2 weeks on the beach in Dalmatia. I've patched my
Haproxy 1.8.11 with all three patches discussed here in last two weeks. It
didn't help. Then tried to run Haproxy with debug enabled. The last logs
from FD hanging in CLOSE_WAIT looks like this:

00020535:frntend.accept(0006)=000b from [ip:58321] ALPN=h2
00020535:frntend.clireq[000b:ffffffff]: POST /some/uri HTTP/1.1
00020535:backend.srvrep[000b:0015]: HTTP/1.1 200 OK
00020535:backend.srvcls[000b:adfd]
00020536:frntend.clicls[000b:ffffffff]
00020536:frntend.closed[000b:ffffffff]
00020537:frntend.clicls[000b:ffffffff]
00020537:frntend.closed[000b:ffffffff]
00020538:frntend.clicls[000b:ffffffff]
00020538:frntend.closed[000b:ffffffff]
00020516:backend.srvcls[000b:adfd]
0002051a:backend.srvcls[000b:adfd]
00020514:backend.srvcls[000b:adfd]
00020514:backend.clicls[000b:adfd]
00020514:backend.closed[000b:adfd]
00020516:backend.clicls[000b:adfd]
00020516:backend.closed[000b:adfd]
0002051a:backend.clicls[000b:adfd]
0002051a:backend.closed[000b:adfd]
0002051d:backend.srvcls[000b:adfd]
0002051d:backend.clicls[000b:adfd]
0002051d:backend.closed[000b:adfd]
0002051e:backend.clicls[000b:adfd]
0002051e:backend.closed[000b:adfd]
0002051f:backend.srvcls[000b:adfd]
0002051f:backend.clicls[000b:adfd]
0002051f:backend.closed[000b:adfd]
00020528:backend.srvcls[000b:adfd]
00020528:backend.clicls[000b:adfd]
00020528:backend.closed[000b:adfd]
00020529:backend.srvcls[000b:adfd]
00020529:backend.clicls[000b:adfd]
00020529:backend.closed[000b:adfd]
0002052a:backend.srvcls[000b:adfd]
0002052a:backend.clicls[000b:adfd]
0002052a:backend.closed[000b:adfd]
0002052b:backend.srvcls[000b:adfd]
0002052b:backend.clicls[000b:adfd]
0002052b:backend.closed[000b:adfd]
00020535:backend.clicls[000b:adfd]
00020535:backend.closed[000b:adfd]

I decided to add some quick'n'dirty debug messages into mux_h2.c to see
more details. I modified send_log function to write to console and used it
for digging inside h2 mux. I'm not attaching the full patch, because it's
ugly and nothing to be proud of. Just an example of patched function to get
the feeling.

static inline void h2s_close(struct h2s *h2s)
{
send_log(NULL, LOG_NOTICE, "mpeh2 fd%d h2s_close - id%08x st%02x fl%08x
streams:%d\n", mpeh2_h2s_fd(h2s), mpeh2_h2s_id(h2s), mpeh2_h2s_st(h2s),
mpeh2_h2s_flags(h2s), h2s->h2c->nb_streams);

if (h2s->st != H2_SS_CLOSED)
h2s->h2c->nb_streams--;
h2s->st = H2_SS_CLOSED;
}

This is how successful connection looks like:

20180628.1614 mpeh2 fd17 h2c_frt_init
20180628.1614 mpeh2 fd17 h2c_stream_new - id00000001 st00 fl00000000
20180628.1614 0000004c:frntend.accept(0006)=0011 from [some_ip:5059] ALPN=h2
20180628.1614 0000004c:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.1614 0000004c:backend.srvrep[0011:0013]: HTTP/1.1 200 OK
20180628.1614 0000004c:backend.srvcls[0011:adfd]
20180628.1614 mpeh2 fd17 h2s_close - id00000001 st04 fl00003001 streams:1
20180628.1614 0000004f:frntend.clicls[0011:ffffffff]
20180628.1614 0000004f:frntend.closed[0011:ffffffff]
20180628.1614 mpeh2 fd17 h2s_destroy - id00000001 st07 fl00003003
20180628.1614 mpeh2 fd17 h2s_close - id00000001 st07 fl00003003 streams:0
20180628.1614 mpeh2 fd17 h2c_stream_new - id00000003 st00 fl00000000
20180628.1614 00000056:frntend.accept(0006)=0011 from [some_ip:5059] ALPN=h2
20180628.1614 00000056:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.1614 00000056:backend.srvrep[0011:0013]: HTTP/1.1 200 OK
20180628.1614 00000056:backend.srvcls[0011:adfd]
20180628.1614 mpeh2 fd17 h2s_close - id00000003 st04 fl00003001 streams:1
20180628.1614 00000057:frntend.clicls[0011:ffffffff]
20180628.1614 00000057:frntend.closed[0011:ffffffff]
20180628.1614 mpeh2 fd17 h2s_destroy - id00000003 st07 fl00003003
20180628.1614 mpeh2 fd17 h2s_close - id00000003 st07 fl00003003 streams:0
20180628.1614 mpeh2 fd17 h2c_stream_new - id00000005 st00 fl00000000
20180628.1614 00000059:frntend.accept(0006)=0011 from [some_ip:5059] ALPN=h2
20180628.1614 00000059:frntend.clireq[0011:ffffffff]: POST /some/uri
HTTP/1.1
20180628.1614 00000059:backend.srvrep[0011:0013]: HTTP/1.1 200 OK
20180628.1614 00000059:backend.srvcls[0011:adfd]
20180628.1614 mpeh2 fd17 h2s_close - id00000005 st04 fl00003101 streams:1
20180628.1614 0000005b:frntend.clicls[0011:ffffffff]
20180628.1614 0000005b:frntend.closed[0011:ffffffff]
20180628.1614 mpeh2 fd17 h2s_destroy - id00000005 st07 fl00003103
20180628.1614 mpeh2 fd17 h2s_close - id00000005 st07 fl00003103 streams:0
20180628.1614 mpeh2 fd17 h2c_stream_new - id00000007 st00 fl00000000
20180628.1614 0000005d:frntend.accept(0006)=0011 from [some_ip:5059] ALPN=h2
20180628.1614 0000005d:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.1614 0000005d:backend.srvrep[0011:0013]: HTTP/1.1 200 OK
20180628.1614 0000005d:backend.srvcls[0011:adfd]
20180628.1614 mpeh2 fd17 h2s_close - id00000007 st04 fl00003001 streams:1
20180628.1614 0000005e:frntend.clicls[0011:ffffffff]
20180628.1614 0000005e:frntend.closed[0011:ffffffff]
20180628.1614 mpeh2 fd17 h2s_destroy - id00000007 st07 fl00003003
20180628.1614 mpeh2 fd17 h2s_close - id00000007 st07 fl00003003 streams:0
20180628.1615 mpeh2 fd17 h2_timeout_task/expired - st02 fl00000000
20180628.1615 mpeh2 fd17 h2c_error - st02 fl00000000 err00
20180628.1615 mpeh2 fd17 h2_release - st06 fl00000100


And this is last h2 connection in CLOSE_WAIT hanging file descriptor:

20180628.2201 mpeh2 fd17 h2c_frt_init
20180628.2201 mpeh2 fd17 h2c_stream_new - id00000001 st00 fl00000000
20180628.2201 0002b064:frntend.accept(0006)=0011 from [ip:57964] ALPN=h2
20180628.2201 0002b064:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.2201 0002b064:backend.srvrep[0011:001d]: HTTP/1.1 200 OK
20180628.2201 0002b064:backend.srvcls[0011:adfd]
20180628.2201 mpeh2 fd17 h2s_close - id00000001 st04 fl00003001 streams:1
20180628.2201 0002b065:frntend.clicls[0011:ffffffff]
20180628.2201 0002b065:frntend.closed[0011:ffffffff]
20180628.2201 mpeh2 fd17 h2s_destroy - id00000001 st07 fl00003003
20180628.2201 mpeh2 fd17 h2s_close - id00000001 st07 fl00003003 streams:0
20180628.2201 mpeh2 fd17 h2c_stream_new - id00000003 st00 fl00000000
20180628.2201 0002b066:frntend.accept(0006)=0011 from [ip:57964] ALPN=h2
20180628.2201 mpeh2 fd17 h2c_stream_new - id00000005 st00 fl00000000
20180628.2201 0002b067:frntend.accept(0006)=0011 from [ip:57964] ALPN=h2
20180628.2201 mpeh2 fd17 h2c_stream_new - id00000007 st00 fl00000000
20180628.2201 0002b068:frntend.accept(0006)=0011 from [ip:57964] ALPN=h2
20180628.2201 mpeh2 fd17 h2c_stream_new - id00000009 st00 fl00000000
20180628.2201 0002b069:frntend.accept(0006)=0011 from [ip:57964] ALPN=h2
20180628.2201 0002b066:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.2201 0002b067:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.2201 0002b068:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.2201 0002b069:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.2201 mpeh2 fd17 h2c_stream_new - id0000000b st00 fl00000000
20180628.2201 0002b06a:frntend.accept(0006)=0011 from [ip:57964] ALPN=h2
20180628.2201 mpeh2 fd17 h2c_stream_new - id0000000d st00 fl00000000
20180628.2201 0002b06b:frntend.accept(0006)=0011 from [ip:57964] ALPN=h2
20180628.2201 0002b06a:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.2201 0002b06b:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.2201 0002b066:backend.srvrep[0011:001d]: HTTP/1.1 200 OK
20180628.2201 0002b069:backend.srvrep[0011:0020]: HTTP/1.1 200 OK
20180628.2201 0002b068:backend.srvrep[0011:001f]: HTTP/1.1 200 OK
20180628.2201 0002b06b:backend.srvrep[0011:0022]: HTTP/1.1 200 OK
20180628.2201 0002b067:backend.srvrep[0011:001e]: HTTP/1.1 200 OK
20180628.2201 0002b06a:backend.srvrep[0011:0021]: HTTP/1.1 200 OK
20180628.2201 mpeh2 fd17 h2c_stream_new - id0000000f st00 fl00000000
20180628.2201 0002b06c:frntend.accept(0006)=0011 from [ip:57964] ALPN=h2
20180628.2201 0002b06c:frntend.clireq[0011:ffffffff]: GET /some/uri HTTP/1.1
20180628.2201 0002b06c:backend.srvrep[0011:0023]: HTTP/1.1 200 OK
20180628.2201 0002b06c:backend.srvcls[0011:adfd]
20180628.2201 mpeh2 fd17 h2c_stream_new - id00000011 st00 fl00000000
20180628.2201 0002b06d:frntend.accept(0006)=0011 from [ip:57964] ALPN=h2
20180628.2201 0002b06d:frntend.clireq[0011:ffffffff]: POST /some/uri
HTTP/1.1
20180628.2201 0002b06d:backend.srvrep[0011:0023]: HTTP/1.1 200 OK
20180628.2201 0002b06d:backend.srvcls[0011:adfd]
20180628.2201 mpeh2 fd17 h2s_close - id0000000f st04 fl00003001 streams:8
20180628.2201 mpeh2 fd17 h2s_close - id00000011 st04 fl00003101 streams:7
20180628.2201 mpeh2 fd17 h2c_error - st04 fl00000000 err05
20180628.2201 0002b06e:frntend.clicls[0011:ffffffff]
20180628.2201 0002b06e:frntend.closed[0011:ffffffff]
20180628.2201 mpeh2 fd17 h2s_destroy - id0000000f st07 fl00003003
20180628.2201 mpeh2 fd17 h2s_close - id0000000f st07 fl00003003 streams:6
20180628.2201 0002b06f:frntend.clicls[0011:ffffffff]
20180628.2201 0002b06f:frntend.closed[0011:ffffffff]
20180628.2201 mpeh2 fd17 h2s_destroy - id00000011 st07 fl00003103
20180628.2201 mpeh2 fd17 h2s_close - id00000011 st07 fl00003103 streams:6
20180628.2201 0002b068:backend.srvcls[0011:adfd]
20180628.2201 0002b068:backend.clicls[0011:adfd]
20180628.2201 0002b068:backend.closed[0011:adfd]
20180628.2201 0002b066:backend.srvcls[0011:adfd]
20180628.2201 0002b066:backend.clicls[0011:adfd]
20180628.2201 0002b066:backend.closed[0011:adfd]
20180628.2201 mpeh2 fd17 h2s_destroy - id00000003 st06 fl00003081
20180628.2201 mpeh2 fd17 h2s_close - id00000003 st06 fl00003081 streams:6
20180628.2201 0002b067:backend.srvcls[0011:adfd]
20180628.2201 0002b067:backend.clicls[0011:adfd]
20180628.2201 0002b067:backend.closed[0011:adfd]
20180628.2201 0002b069:backend.srvcls[0011:adfd]
20180628.2201 0002b069:backend.clicls[0011:adfd]
20180628.2201 0002b069:backend.closed[0011:adfd]
20180628.2201 mpeh2 fd17 h2s_destroy - id00000009 st06 fl00003081
20180628.2201 mpeh2 fd17 h2s_close - id00000009 st06 fl00003081 streams:5
20180628.2201 0002b06a:backend.srvcls[0011:adfd]
20180628.2201 0002b06a:backend.clicls[0011:adfd]
20180628.2201 0002b06a:backend.closed[0011:adfd]
20180628.2201 0002b06b:backend.srvcls[0011:adfd]
20180628.2201 0002b06b:backend.clicls[0011:adfd]
20180628.2201 0002b06b:backend.closed[0011:adfd]

H2 conection first handles some requests in h2 streams. After destroying
some it hits h2c_error with H2_ERR_STREAM_CLOSED. After that it destroys a
few more, but stops with non-zero h2c.nb_streams. It doesn't close the rest
of h2 streams and doesn't reach the h2_release.

I've added more debug into h2s_close to see not only h2s state and flags
but also h2c state and flags. My only way to reproduce the bug is to let
Haproxy run until some of its FD falls into CLOSE_WAIT. After I catch some,
I'll report back.

Please feel free to suggest where else in mux_h2.c should I add debugs to
see why h2_release is not reached.

Best Regards,
Milan


On Fri, 15 Jun 2018 at 11:21, Willy Tarreau <w...@1wt.eu> wrote:

> On Fri, Jun 15, 2018 at 11:19:04AM +0200, Janusz Dziemidowicz wrote:
> > 2018-06-14 19:49 GMT+02:00 Willy Tarreau <w...@1wt.eu>:
> > > On Thu, Jun 14, 2018 at 07:22:34PM +0200, Janusz Dziemidowicz wrote:
> > >> 2018-06-14 18:56 GMT+02:00 Willy Tarreau <w...@1wt.eu>:
> > >>
> > >> > If you'd like to run a test, I'm attaching the patch.
> > >>
> > >> Sure, but you forgot to attach it :)
> > >
> > > Ah, that's because I'm stupid :-)
> > >
> > > Here it comes this time.
>

Reply via email to