So after Kirill suggestion to modify h2s struct in a way that tasklet "shut_tl" is before recv_wait I verified if in 2.2.4 the same crash will occur aaaand it did not!
After the patch that merges recv_wait and send_wait: http://git.haproxy.org/?p=haproxy-2.2.git;a=commit;h=f96508aae6b49277dcf142caa35042678cf8e2ca and witch such h2s (tasklet shut_tl before wait_event subs) the crashes are gone: struct h2s { [...] struct buffer rxbuf; /* receive buffer, always valid (buf_empty or real buffer) */ struct tasklet *shut_tl; /* deferred shutdown tasklet, to retry to send an RST after we failed to, * in case there's no other subscription to do it */ struct wait_event *subs; /* recv wait_event the conn_stream associated is waiting on (via h2_subscribe) */ struct list list; /* To be used when adding in h2c->send_list or h2c->fctl_lsit */ }; pon., 2 lis 2020 o 12:42 Maciej Zdeb <mac...@zdeb.pl> napisał(a): > Great idea Kirill, > > With such modification: > > struct h2s { > [...] > struct tasklet *shut_tl; > struct wait_event *recv_wait; /* recv wait_event the conn_stream > associated is waiting on (via h2_subscribe) */ > struct wait_event *send_wait; /* send wait_event the conn_stream > associated is waiting on (via h2_subscribe) */ > struct list list; /* To be used when adding in h2c->send_list or > h2c->fctl_lsit */ > }; > > it crashed just like before. > > pon., 2 lis 2020 o 11:12 Kirill A. Korinsky <kir...@korins.ky> napisał(a): > >> Hi, >> >> Thanks for update. >> >> After read Wully's recommendation and provided commit that fixed an issue >> I'm curious can you "edit" a bit this commit and move `shut_tl` before >> `recv_wait` instead of removed `wait_event`? >> >> It is a quiet dummy way to confirm that memory corruption had gone, and >> not just moved to somewhere else. >> >> -- >> wbr, Kirill >> >> On 2. Nov 2020, at 10:58, Maciej Zdeb <mac...@zdeb.pl> wrote: >> >> Hi, >> >> Update for people on the list that might be interested in the issue, >> because part of discussion was private. >> >> I wanted to check Willy suggestion and modified h2s struct (added dummy >> fields): >> >> struct h2s { >> [...] >> uint16_t status; /* HTTP response status */ >> unsigned long long body_len; /* remaining body length according >> to content-length if H2_SF_DATA_CLEN */ >> struct buffer rxbuf; /* receive buffer, always valid (buf_empty >> or real buffer) */ >> int dummy0; >> struct wait_event wait_event; /* Wait list, when we're attempting >> to send a RST but we can't send */ >> int dummy1; >> struct wait_event *recv_wait; /* recv wait_event the conn_stream >> associated is waiting on (via h2_subscribe) */ >> int dummy2; >> struct wait_event *send_wait; /* send wait_event the conn_stream >> associated is waiting on (via h2_subscribe) */ >> int dummy3; >> struct list list; /* To be used when adding in h2c->send_list or >> h2c->fctl_lsit */ >> struct list sending_list; /* To be used when adding in >> h2c->sending_list */ >> }; >> >> With such modified h2s struct, the crash did not occur. >> >> I've checked HAProxy 2.1, it crashes like 2.0. >> >> I've also checked 2.2, bisection showed that this commit: >> http://git.haproxy.org/?p=haproxy-2.2.git;a=commitdiff;h=5723f295d85febf5505f8aef6afabb6b23d6fdec;hp=f11be0ea1e8e571234cb41a2fcdde2cf2161df37 >> fixed the crashes we experienced. I'm not sure how/if it fixed the memory >> corruption, it is possible that memory is still corrupted but not causing >> the crash. >> >> >> >> pt., 25 wrz 2020 o 16:25 Kirill A. Korinsky <kir...@korins.ky> >> napisał(a): >> >>> Very interesting. >>> >>> Anyway, I can see that this pice of code was refactored some time ago: >>> https://github.com/haproxy/haproxy/commit/f96508aae6b49277dcf142caa35042678cf8e2ca >>> >>> Maybe it is worth to try 2.2 or 2.3 branch? >>> >>> Yes, it is a blind shot and just a guess. >>> >>> -- >>> wbr, Kirill >>> >>> On 25. Sep 2020, at 16:01, Maciej Zdeb <mac...@zdeb.pl> wrote: >>> >>> Yes at the same place with same value: >>> >>> (gdb) bt full >>> #0 0x0000559ce98b964b in h2s_notify_recv (h2s=0x559cef94e7e0) at >>> src/mux_h2.c:783 >>> sw = 0xffffffff >>> >>> >>> >>> pt., 25 wrz 2020 o 15:42 Kirill A. Korinsky <kir...@korins.ky> >>> napisał(a): >>> >>>> > On 25. Sep 2020, at 15:26, Maciej Zdeb <mac...@zdeb.pl> wrote: >>>> > >>>> > I was mailing outside the list with Willy and Christopher but it's >>>> worth sharing that the problem occurs even with nbthread = 1. I've managed >>>> to confirm it today. >>>> >>>> >>>> I'm curious is it crashed at the same place with the same value? >>>> >>>> -- >>>> wbr, Kirill >>>> >>>> >>>> >>> >>