Hi Willy,

Do you *think* that you got less CLOSE_WAITs or that the latest fixes
> didn't change anything ? I suspect that for some reason you might be
> hit by several bugs, which is what has complicated the diagnostic, but
> that's just pure guess.
>
>
I'm not sure. I left patched haproxy running over last weekend. I have a
slight feeling that there was less hanging connections than over last week,
but it could be because of lower weekend traffic. Now i'm running latest
GIT version (1.8.12-5e100b-15) and i'll compare the speed of blocked
connections increase against vannila 1.8.12 from last week.

Oh I'm just seeing you already did that in the next e-mail. Thank you :-)
>

I'll send more extended "show fd" dumps as soon as i catch some more.
Please let me know If you add more h2 state info into 1.8 git version and
i'll run it in production to get more info.

So we have this :
>
>      25 : st=0x20(R:pra W:pRa) ev=0x00(heopi) [nlc] cache=0
> owner=0x24f0a70
> iocb=0x4d34c0(conn_fd_handler) tmask=0x1 umask=0x0 cflg=0x80203300
>
> fe=fe-http mux=H2 mux_ctx=0x258a880 st0=7 flg=0x00001000 nbst=8 nbcs=0
>
> fctl_cnt=0 send_cnt=8 tree_cnt=8 orph_cnt=8 dbuf=0/0 mbuf=0/16384
>
>
>   - st0=7 => H2_CS_ERROR2 : an error was sent, either it succeeded or
>     could not be sent and had to be aborted nonetheless ;
>
>   - flg=1000 => H2_CF_GOAWAY_SENT : the GOAWAY frame was sent to the mux
>     buffer.
>
>   - nbst=8 => 8 streams still attached
>
>   - nbcs=0 => 0 conn_streams found (application layer detached or not
>     attached yet)
>
>   - send_cnt=8 => 8 streams still in the send_list, waiting for the mux
>     to pick their contentx.
>
>   - tree_cnt=8 => 8 streams known in the tree (hence they are still valid
>     from the H2 protocol perspective)
>
>   - orph_cnt=8 => 8 streams are orphaned : these streams have quit at the
>     application layer (very likely a timeout).
>
>   - mbuf=0/16384 : the mux buffer is empty but allocated. It's not very
>     common.
>
> At this point what it indicates is that :
>   - 8 streams were active on this connection and a response was sent (at
>     least partially) and probably waited for the mux buffer to be empty
>     due to data from other previous streams. I'm realising it would be
>     nice to also report the highest stream index to get an idea of the
>     number of past streams on the connection.
>
>   - an error happened (protocol error, network issue, etc, no more info
>     at the moment) and caused haproxy to emit a GOAWAY frame. While doing
>     so, the pending streams in the send_list were not destroyed.
>
>   - then for an unknown reason the situation doesn't move anymore. I'm
>     realising that one case I figured in the past with an error possibly
>     blocking the connection at least partially covers one point here, it
>     causes the mux buffer to remain allocated, so this patch would have
>     caused it to be released, but it's still incomplete.
>
> Now I have some elements to dig through, I'll try to mentally reproduce
> the complex sequence of a blocked response with a GOAWAY being sent at
> the same time to see what happens.
>
>
Thanks a lot for a detailed description.
Milan

Reply via email to