Dear David,
the crash looks like a problem in the OpenSSL memory management.
In general, i would believe that this is a problem in the NaviServer
code, but of the interplay of the various memory management options of
OpenSSL, NaviServer and Tcl. We use these functions under heavy load on
many servers, but we are careful to use everywhere the same malloc
implementation (actually Google's TCmalloc).
OpenSSL:
======
In general, OpenSSL supports configuration of management routines.
However, the memory management interface of OpenSSL changed with the
release of OpenSSL 1.1.0. As a consequence, when compiling NaviServer
with newer versions, of OpenSSL, the native OpenSSL memory routines are
used. The commit [1] says: "Registering our own functions does not seem
necessary". So, if one compiles a version of NaviServer between 4.99.15
and 4.99.20 with newer versions of OpenSSL, there might a problem arise,
when the native OpenSSL malloc implementation is not full thread-safe,
or when a mix between different malloc implementation happens.
NaviServer:
=======
When NaviServer is compiled with -DSYSTEM_MALLOC, ns_malloc() uses
malloc() etc., otherwise it uses Tcl's ckalloc() and friends.
Tcl:
===
There exists as well a patch [2] for using internally in Tcl as well
system malloc instead of Tcl's own mt-threaded version.
In Oct there was as well a small patch for NaviServer for cases, were
Tcl and NaviServer are compiled with different memory allocators [3].
My first attempt would be to compile NaviServer with SYSTEM_MALLOC and
check, whether you still experience a problem. The next recommendation
would be to check, what malloc versions are used by which subsystems and
align these if necessary.
i will look into reviving the configuration of OpenSSL to allow to
configure its malloc implementation as it was possible before OpenSSL 1.1.0.
-gn
[1]
https://bitbucket.org/naviserver/naviserver/commits/896a4e3765f91b048ccbf570e5afe21b1bb1a41f
[2] https://github.com/gustafn/install-ns
[3]
https://bitbucket.org/naviserver/naviserver/commits/caab40365f0429a44740db1927e9f459d733db3f
On 14.12.20 18:07, David Osborne wrote:
Hi,
We're building some Naviserver instances (4.99.19) on Debian Buster
(v10.7).
One of the instances is a revproxy instance which uses connchans to
speak to a back end.
We're seeing very frequent signal 11 crashes of NaviServer with this
combination.
(We also see this infrequently with 4.99.18 running on Debian Stretch
(v9))
Because of the increased frequency I've managed to take a core dump
and the issue appears to be when calling SSL_CTX_new
after Ns_TLS_CtxClientCreate.
I realise I don't have gdb properly configured, but wondering if the
backtrace as it is could shed any light on what's going on or is it
still too opaque?
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/naviserver/bin/nsd -u nsd -g nsd -b
0.0.0.0:80 <http://0.0.0.0:80>,0.0.0.0:443 <http://0.0.0.0:443> -i -t
/etc/'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f4405ddf700 (LWP 13613))]
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f4407936535 in __GI_abort () at abort.c:79
#2 0x00007f440847cfe6 in Panic (fmt=<optimized out>) at log.c:928
#3 0x00007f44080fbc4a in Tcl_PanicVA () from
/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>
#4 0x00007f44080fbdb9 in Tcl_Panic () from
/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>
#5 0x00007f44084bbc74 in Abort (signal=<optimized out>) at unix.c:1115
#6 <signal handler called>
#7 malloc_consolidate (av=av@entry=0x7f43bc000020) at malloc.c:4486
#8 0x00007f4407996a58 in _int_malloc (av=av@entry=0x7f43bc000020,
bytes=bytes@entry=1024) at malloc.c:3695
#9 0x00007f440799856a in __GI___libc_malloc (bytes=1024) at malloc.c:3057
#10 0x00007f4407c63559 in CRYPTO_zalloc () from
/lib/x86_64-linux-gnu/libcrypto.so.1.1
#11 0x00007f4407df7699 in SSL_CTX_new () from
/lib/x86_64-linux-gnu/libssl.so.1.1
#12 0x00007f44084b4d85 in Ns_TLS_CtxClientCreate
(interp=interp@entry=0x7f43bc009ee0, cert=cert@entry=0x0,
caFile=caFile@entry=0x0, caPath=caPath@entry=0x0,
verify=verify@entry=false, ctxPtr=ctxPtr@entry=0x7f4405dde7c0) at
tls.c:116
#13 0x00007f44084687a4 in ConnChanOpenObjCmd (clientData=<optimized
out>, interp=0x7f43bc009ee0, objc=<optimized out>, objv=<optimized out>)
at connchan.c:1010
#14 0x00007f44084a7eb8 in Ns_SubcmdObjv
(subcmdSpec=subcmdSpec@entry=0x7f4405dde990,
clientData=0x7f43bc047870, interp=0x7f43bc009ee0, objc=13,
objv=0x7f43bc017ff8) at tclobjv.c:1849
#15 0x00007f4408469d45 in NsTclConnChanObjCmd (clientData=<optimized
out>, interp=<optimized out>, objc=<optimized out>, objv=<optimized out>)
at connchan.c:1761
#16 0x00007f440802ffb7 in TclNRRunCallbacks () from
/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>
#17 0x00007f44080313af in ?? () from
/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>
#18 0x00007f4408030d13 in Tcl_EvalEx () from
/lib/x86_64-linux-gnu/libtcl8.6.so <http://libtcl8.6.so>
#19 0x00007f44084a9164 in NsTclFilterProc (arg=0x55af6a3e9880,
conn=0x55af6a502480, why=NS_FILTER_PRE_AUTH) at tclrequest.c:535
#20 0x00007f4408478370 in NsRunFilters
(conn=conn@entry=0x55af6a502480, why=why@entry=NS_FILTER_PRE_AUTH) at
filter.c:160
#21 0x00007f440848654d in ConnRun
(connPtr=connPtr@entry=0x55af6a502480) at queue.c:2450
#22 0x00007f4408485b33 in NsConnThread (arg=0x55af6a4a0090) at
queue.c:2157
#23 0x00007f44081b2bb1 in NsThreadMain (arg=0x55af6a354f50) at
thread.c:230
#24 0x00007f44081b3af9 in ThreadMain (arg=<optimized out>) at
pthread.c:836
#25 0x00007f44078f5fa3 in start_thread (arg=<optimized out>) at
pthread_create.c:486
#26 0x00007f4407a0d4cf in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
--
Regards,
David
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel