Le 10/04/2021 à 00:34, Robin H. Johnson a écrit :
On Fri, Apr 09, 2021 at 10:14:26PM +0200, Christopher Faulet wrote:
It seems you have a blocking call in one of your lua script. The threads dump
shows many threads blocked in hlua_ctx_init. Many others are executing lua.
Unfortunately, for a unknown reason, there is no stack traceback.
All of our Lua is string handling. Parsing headers, and then building
TXN variables as well as a set of applets that return responses in cases
where we don't want to go to a backend server as the response is simple
enough to generate inside the LB.

For the 2.3 and prior, the lua scripts are executed under a global lock. Thus
blocking calls in a lua script are awful, because it does not block only one
thread but all others too. I guess the same issue exists on the 1.8, but there
is no watchdog on this version. Thus, time to time HAProxy hangs and may report
huge latencies but, at the end it recovers and continues to process data. It is
exactly the purpose of the watchdog, reporting hidden bugs related to spinning
loops and deadlocks.
Nothing in this Lua code should have blocking calls at all.
The Lua code has zero calls to external services, no sockets, no sleeps,
no print, no Map.new (single call in the Lua startup, not inside any
applet or fetch), no usage of other packages, no file handling, no other
IO.

I'm hoping I can get $work to agree to fully open-source the Lua, so you
can see this fact and review the code to confirm that it SHOULD be
non-blocking.

I trust you on this point. If you are not using external component, it should indeed be ok. So, it is probably a contention issue on the global Lua lock. If you are able to generate and inspect a core file, it should help you to figure out what really happens.


However, I may be wrong. It may be just a contention problem because your are
executing lua with 64 threads and a huge workload. In this case, you may give a
try to the 2.4 (under development). There is a way to have a separate lua
context for each thread loading the scripts with "lua-load-per-thread"
directive. Out of curiosity, on the 1.8, are you running HAProxy with several
threads or are you spawning several processes?
nbthread=64, nbproc=1 on both 1.8/2.x

It is thus surprising, if it is really a contention issue, that you never observed slow down on the 1.8. There is no watchdog, but the thread implementation is a bit awkward on the 1.8. 2.X are better on this point, the best being the 2.4.

Yes, we're hoping to try 2.4.x, just working on some parts to get there.


--
Christopher Faulet

Reply via email to