On 08/27/2018 03:09 PM, Olivier Houchard wrote:
On Mon, Aug 27, 2018 at 02:29:42PM +0200, Frederic Lecaille wrote:
On 08/27/2018 01:33 PM, Olivier Houchard wrote:
Hi Pieter,

On Sat, Aug 25, 2018 at 10:00:04PM +0200, PiBa-NL wrote:
Hi List, Thierry, Olivier,

Using a lua-socket with connect_ssl and haproxy running with nbthread 3..
results in haproxy hanging with 3 threads for me.

This while using both 1.9-7/30 version (with the 2 extra patches from
Olivier avoiding 100% on a single thread.) and also a build of today's
snapshot: HA-Proxy version 1.9-dev1-e3faf02 2018/08/25

Below info is at the bottom of the mail:
- haproxy -vv
- gdb backtraces

This one is easy to reproduce after just a few calls to the lua function
with the lua code i'm writing on a test-box.. So if a 'simple' config that
makes a reproduction is desired i can likely come up with one.
Same lua code with nbthread 1 seems to work properly.

Is below info (the stack traces) enough to come up with a fix? If not lemme
know and ill try and make a small reproduction of it.


root@freebsd11:~ # haproxy -vv
HA-Proxy version 1.9-dev1-e3faf02 2018/08/25
Copyright 2000-2018 Willy Tarreau <wi...@haproxy.org>

Build options :
    TARGET  = freebsd
    CPU     = generic
    CC      = cc
    CFLAGS  = -DDEBUG_THREAD -DDEBUG_MEMORY -pipe -g -fstack-protector
-fno-strict-aliasing -fno-strict-aliasing -Wdeclaration-after-statement
-fwrapv -fno-strict-overflow -Wno-address-of-packed-member
-Wno-null-dereference -Wno-unused-label -DFREEBSD_PORTS -DFREEBSD_PORTS
    OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_CPU_AFFINITY=1 USE_ACCEPT4=1
USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 USE_PCRE_JIT=1

Default settings :
    maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with network namespace support.
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"),
raw-deflate("deflate"), gzip("gzip")
Built with PCRE version : 8.40 2017-01-11
Running on PCRE version : 8.40 2017-01-11
PCRE library supports JIT : yes
Built with multi-threading support.
Encrypted password support via crypt(3): yes
Built with transparent proxy support using: IP_BINDANY IPV6_BINDANY
Built with Lua version : Lua 5.3.4
Built with OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
Running on OpenSSL version : OpenSSL 1.0.2k-freebsd  26 Jan 2017
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2

Available polling systems :
       kqueue : pref=300,  test result OK
         poll : pref=200,  test result OK
       select : pref=150,  test result OK
Total: 3 (3 usable), will use kqueue.

Available multiplexer protocols :
(protocols markes as <default> cannot be specified using 'proto' keyword)
         <default> : mode=TCP|HTTP   side=FE|BE
                h2 : mode=HTTP       side=FE

Available filters :
          [TRACE] trace
          [COMP] compression
          [SPOE] spoe

root@freebsd11:~ # /usr/local/bin/gdb81 --pid 39649
GNU gdb (GDB) 8.1 [GDB v8.1 for FreeBSD]
Copyright (C) 2018 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-portbld-freebsd11.1".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 39649
Reading symbols from /usr/local/sbin/haproxy...done.
[New LWP 101651 of process 39649]
[New LWP 101652 of process 39649]
Reading symbols from /lib/libcrypt.so.5...(no debugging symbols
found)...done.
Reading symbols from /lib/libz.so.6...(no debugging symbols found)...done.
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libssl.so.8...(no debugging symbols
found)...done.
Reading symbols from /lib/libcrypto.so.8...(no debugging symbols
found)...done.
Reading symbols from /usr/local/lib/liblua-5.3.so...(no debugging symbols
found)...done.
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols
found)...done.
[Switching to LWP 101650 of process 39649]
0x0000000801e11e3a in _kevent () from /lib/libc.so.7
(gdb) info thread
    Id   Target Id         Frame
* 1    LWP 101650 of process 39649 0x0000000801e11e3a in _kevent () from
/lib/libc.so.7
    2    LWP 101651 of process 39649 0x0000000000437b92 in __spin_lock
(lbl=LUA_LOCK, l=0x8cf1d8 <hlua_global_lock>, func=0x62a781
"hlua_ctx_resume",
      file=0x62a328 "src/hlua.c", line=1070) at include/common/hathreads.h:731
    3    LWP 101652 of process 39649 0x000000080187a70c in ?? () from
/usr/local/lib/liblua-5.3.so
(gdb) bt full
#0  0x0000000801e11e3a in _kevent () from /lib/libc.so.7
No symbol table info available.
#1  0x0000000800f15ca2 in ?? () from /lib/libthr.so.3
No symbol table info available.
#2  0x000000000041dc4a in _do_poll (p=0x8d0548 <cur_poller>, exp=1923638009)
at src/ev_kqueue.c:151
          status = 37937304
          count = 1923638009
          fd = 200
          delta_ms = 1000
          timeout = {tv_sec = 1, tv_nsec = 0}
          updt_idx = 0
          changes = 0
          old_fd = -1
#3  0x000000000051a975 in run_poll_loop () at src/haproxy.c:2419
          next = 1923638009
          exp = 1923638009
#4  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f0) at
src/haproxy.c:2451
          start_lock = {lock = 0, info = {owner = 0, waiters = 0,
last_location = {function = 0x0, file = 0x0, line = 0}}}
          ptif = 0x8c1980 <per_thread_init_list>
          ptdf = 0x800f1841e <pthread_sigmask+46>
#5  0x000000000051423e in main (argc=3, argv=0x7fffffffeb28) at
src/haproxy.c:3053
          tids = 0x8024849f0
          threads = 0x802510ac0
          i = 3
          old_sig = {__bits = {1073741824, 0, 0, 0}}
          blocked_sig = {__bits = {4227856759, 4294967295, 4294967295,
4294967295}}
          err = 0
          retry = 200
          limit = {rlim_cur = 4033, rlim_max = 4033}
          errmsg = 
"\000\353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\063\207Z\306\357O5\347\240\211\214\000\000\000\000\000
 
\353\377\377\377\177\000\000H\353\377\377\377\177\000\000(\353\377\377\377\177\000\000\003\000\000\000\000\000\000\000\300\352\377\377\377\177\000\000R\201\340\001\b\000\000\000\001\000\000"
          pidfd = -1
(gdb) thread 2
[Switching to thread 2 (LWP 101651 of process 39649)]
#0  0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8
<hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328
"src/hlua.c", line=1070)
      at include/common/hathreads.h:731
731             __SPIN_LOCK(&l->lock);
(gdb) bt full
#0  0x0000000000437b92 in __spin_lock (lbl=LUA_LOCK, l=0x8cf1d8
<hlua_global_lock>, func=0x62a781 "hlua_ctx_resume", file=0x62a328
"src/hlua.c", line=1070)
      at include/common/hathreads.h:731
          __pl_l = 0x8cf1d8 <hlua_global_lock>
          __pl_r = 4294967300
          start_time = 7797095086642562
#1  0x0000000000438a5e in hlua_ctx_resume (lua=0x8024f4500, yield_allowed=1)
at src/hlua.c:1070
          ret = 9240032
          msg = 0x456171 <eb32sc_next_with_parent+97> "H\203\370"
          trace = 0x7fffdfffdcb0 ""
#2  0x000000000044236d in hlua_process_task (task=0x80242d960,
context=0x8024f4500, state=513) at src/hlua.c:5641
          hlua = 0x8024f4500
          status = HLUA_E_OK
#3  0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384
          t = 0x80242d960
          state = 513
          ctx = 0x8024f4500
          process = 0x4422f0 <hlua_process_task>
          t = 0x80242df00
          max_processed = 200
#4  0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386
          next = 1923609561
          exp = 1923608777
#5  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f4) at
src/haproxy.c:2451
          start_lock = {lock = 0, info = {owner = 0, waiters = 0,
last_location = {function = 0x0, file = 0x0, line = 0}}}
          ptif = 0x8c1980 <per_thread_init_list>
          ptdf = 0x800f177cc
#6  0x0000000800f12bc5 in ?? () from /lib/libthr.so.3
No symbol table info available.
#7  0x0000000000000000 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x7fffdfffe000
(gdb) thread 3
[Switching to thread 3 (LWP 101652 of process 39649)]
#0  0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so
(gdb) bt full
#0  0x000000080187a70c in ?? () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#1  0x000000080187acd7 in ?? () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#2  0x000000080187b108 in ?? () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#3  0x0000000801873e30 in lua_gc () from /usr/local/lib/liblua-5.3.so
No symbol table info available.
#4  0x0000000000438e45 in hlua_ctx_resume (lua=0x8024f4f80, yield_allowed=1)
at src/hlua.c:1186
          ret = 0
          msg = 0x5a8a24 <b_alloc_margin+324> "\351\200"
          trace = 0x7fffdfdfcc60 "\240\314\337\337\377\177"
#5  0x000000000044887a in hlua_applet_http_fct (ctx=0x8024c5880) at
src/hlua.c:6716
          si = 0x802419540
          strm = 0x802419200
          res = 0x802419270
          rule = 0x80242e360
          px = 0x802512c00
          hlua = 0x8024f4f80
          blk1 = 0x7fffdfdfcca0 ""
          len1 = 34397588737
          blk2 = 0x802419278 ""
          len2 = 34397590136
          ret = 0
#6  0x00000000005a78a7 in task_run_applet (t=0x80242efe0,
context=0x8024c5880, state=16385) at src/applet.c:49
          app = 0x8024c5880
          si = 0x802419540
#7  0x00000000005a49a6 in process_runnable_tasks () at src/task.c:384
          t = 0x80242efe0
          state = 16385
          ctx = 0x8024c5880
          process = 0x5a77f0 <task_run_applet>
          t = 0x80242efe0
          max_processed = 200
#8  0x000000000051a6b2 in run_poll_loop () at src/haproxy.c:2386
          next = 1923608730
          exp = 1923608070
#9  0x0000000000517672 in run_thread_poll_loop (data=0x8024849f8) at
src/haproxy.c:2451
          start_lock = {lock = 0, info = {owner = 0, waiters = 0,
last_location = {function = 0x0, file = 0x0, line = 0}}}
          ptif = 0x8c1980 <per_thread_init_list>
          ptdf = 0x800f177cc
---Type <return> to continue, or q <return> to quit---
#10 0x0000000800f12bc5 in ?? () from /lib/libthr.so.3
No symbol table info available.
#11 0x0000000000000000 in ?? ()
No symbol table info available.
Backtrace stopped: Cannot access memory at address 0x7fffdfdfd000
(gdb)



I found one minor problem with the LUA code (patch attached) but I kinda
doubt this is your problem here.

Interesting patch, because I have noticed that reg-test/lua/b00001.vtc may
sometimes fail (with a haproxy 100% CPU usage) in the TCP mode part (when
using c2 client), both when *thread support is enabled or not.*

So I think your patch may have fixed the same issue I can try to reproduce.

Well, I have just tried to reproduce it running a loop of
reg-test/lua/b00001.vtc script, and your patch does not solve it. Obviously,
this does not mean your patch is not correct.


Hmm this is interesting, but I'm not sure it's the same problem as Pieter', as
he specifically said it worked fine with only 1 thread.

According to Pieter traces, haproxy has registered HTTP service mode lua applets in HTTP mode. Your patch fixes a TCP service mode issue. reg-test/lua/b00001.vtc script runs both HTTP and TCP lua applets. But this is the TCP mode one which makes sometimes fail this script.

It seems one thread is stuck in lua_gc() while holding the global LUA lock,
but I don't know enough about LUA to guess what is going on.

What is suspect is that the HTTP and TCP applet functions
hlua_applet_(http|tcp)_fct() are called several times even when the applet
is done, or when the streams are disconnected or closed:

  /* If the stream is disconnect or closed, ldo nothing. */
     if (unlikely(si->state == SI_ST_DIS || si->state == SI_ST_CLO))
         return;

this leads to call hlua_ctx_resume() several times from the same thread I
guess.


But if hlua_applet_(http|tcp)_fct() just returns, who calls
hlua_ctx_resume() ? :)

hlua_applet_(http|tcp)_fct() functions. If your run the script previously mentioned, when it fails this is because hlua_applet_*tcp*_fct() is infinitely called.


This may be exposed running reg-tests/lua/b00001.vtc continuously:

     while HAPROXY_PROGRAM=~/src/haproxy/haproxy
~/src/varnish-cache-trunk/bin/varnishtest/varnishtest -t5 -lv
reg-tests/lua/b00001.vtc; do echo
--------------------------------------------------------------------; done

Each test must take about 120ms. So, if varnistest stops running this reg
test during about 1s you will have to interrupt it to prevent it to kill
haproxy, and to attach gdb to the haproxy processes.



Very interesting, time for me to setup varnishtest.


Regards,

Olivier



Reply via email to