Hi Christopher, Thanks! I'm building a patched version and will return with feedback!
Kind regards, pt., 19 mar 2021 o 16:40 Christopher Faulet <cfau...@haproxy.com> napisał(a): > Le 16/03/2021 à 13:46, Maciej Zdeb a écrit : > > Sorry for spam. In the last message I said that the old process (after > reload) > > is consuming cpu for lua processing and that's not true, it is > processing other > > things also. > > > > I'll take a break. ;) Then I'll verify if the issue exists on 2.3 and > maybe 2.4 > > branch. For each version I need a week or two to be sure the issue does > not > > occur. :( > > > > If 2.3 and 2.4 behaves the same way the 2.2 does, I'll try to confirm if > there > > is any relation between infinite loops and custom configuration: > > - lua scripts (mainly used for header generation/manipulation), > > - spoe (used for sending metadata about each request to external > service), > > - peers (we have a cluster of 12 HAProxy servers connected to each > other). > > > Hi Maciej, > > I've read more carefully your backtraces, and indeed, it seems to be > related to > lua processing. I don't know if the watchdog is triggered because of the > lua or > if it is just a side-effect. But the lua execution is interrupted inside > the > memory allocator. And malloc/realloc are not async-signal-safe. > Unfortunately, > when the lua stack is dumped, the same allocator is also used. At this > stage, > because a lock was not properly released, HAProxy enter in a deadlock. > > On other threads, we loop in the watchdog, waiting for the hand to dump > the > thread information and that explains the 100% CPU usage you observe. > > So, to prevent this situation, the lua stack must not be dumped if it was > interrupted inside an unsafe part. It is the easiest way we found to > workaround > this bug. And because it is pretty rare, it should be good enough. > > However, I'm unable to reproduce the bug. Could you test attached patches > please > ? I attached patched for the 2.4, 2.3 and 2.2. Because you experienced > this bug > on the 2.2, it is probably easier to test patches for this version. > > If fixed, it could be good to figure out why the watchdog is triggered on > your > old processes. > > -- > Christopher Faulet >