Hi Willy,

>> >>>> There are very few abort() calls in the code :
>> >>>>   - some in the thread debugging code to detect recursive locks ;
>> >>>>   - one in the cache applet which triggers on an impossible case very
>> >>>>     likely resulting from cache corruption (hence a bug)
>> >>>>   - a few inside the Lua library
>> >>>>   - a few in the HPACK decompressor, detecting a few possible bugs there
>> 
>> After playing around with some config changes we managed to not have haproxy
>> throw the "worker <pid> exited with code 134" error for at least a day. Which
>> is a long time as before we had this error at least 5 times a day...
>
> Great!
>
>> The line we removed from our config to get this result was:
>> compression algo gzip
>
> Hmmm interesting.
>
>> Could it be a locking issue in the compression code? I'm going to run a few
>> more days without compression enabled, but for now this looks promising!
>
> In fact, the locking is totally disabled when not using compression, so
> it cannot be an option. Also, most of the recently fixed bugs may only
> be triggered with H2 or threads, none of which you're using. I rechecked
> the compression code to try to spot anything obvious, but nothing popped
> out :-/
>
> All I can strongly recommend if you retry with compression enabled is to
> do it with latest 1.8 release. I'm currently checking that I didn't miss
> anything to issue 1.8.6 hopefully today. If it still dies, this will at
> least rule out the possible side effects of a few of the bugs we've fixed
> since, all of which were really tricky.

We tested haproxy 1.8.6 with compression enabled today, within the first few 
hours it already went wrong:
[ALERT] 095/120526 (12989) : Current worker 5241 exited with code 134

Our other balancer running haproxy 1.8.5 with compression disabled is still 
running fine after 2 days with the same workload.
So there seems to be a locking issue when compression is enabled.

Thanks,
Frank

Reply via email to