Hi Willy, >> >>>> There are very few abort() calls in the code : >> >>>> - some in the thread debugging code to detect recursive locks ; >> >>>> - one in the cache applet which triggers on an impossible case very >> >>>> likely resulting from cache corruption (hence a bug) >> >>>> - a few inside the Lua library >> >>>> - a few in the HPACK decompressor, detecting a few possible bugs there >> >> After playing around with some config changes we managed to not have haproxy >> throw the "worker <pid> exited with code 134" error for at least a day. Which >> is a long time as before we had this error at least 5 times a day... > > Great! > >> The line we removed from our config to get this result was: >> compression algo gzip > > Hmmm interesting. > >> Could it be a locking issue in the compression code? I'm going to run a few >> more days without compression enabled, but for now this looks promising! > > In fact, the locking is totally disabled when not using compression, so > it cannot be an option. Also, most of the recently fixed bugs may only > be triggered with H2 or threads, none of which you're using. I rechecked > the compression code to try to spot anything obvious, but nothing popped > out :-/ > > All I can strongly recommend if you retry with compression enabled is to > do it with latest 1.8 release. I'm currently checking that I didn't miss > anything to issue 1.8.6 hopefully today. If it still dies, this will at > least rule out the possible side effects of a few of the bugs we've fixed > since, all of which were really tricky.
We tested haproxy 1.8.6 with compression enabled today, within the first few hours it already went wrong: [ALERT] 095/120526 (12989) : Current worker 5241 exited with code 134 Our other balancer running haproxy 1.8.5 with compression disabled is still running fine after 2 days with the same workload. So there seems to be a locking issue when compression is enabled. Thanks, Frank