Thanks very much Gustaf, Looking in the build output I see the "-DSYSTEM_MALLOC" was already in place during the build of the binary which is now crashing.
So I have *removed* DSYSTEM_MALLOC from the default build flags and created a new build... so far I've not been able to get that to crash in test (where-as it was crashing every 3-4 requests when built with DSYSTEM_MALLOC) I don't fully understand the implications of this - Is it a suitable solution which we could use in production? On Mon, 14 Dec 2020 at 20:36, Gustaf Neumann <neum...@wu.ac.at> wrote: > Dear David, > > the crash looks like a problem in the OpenSSL memory management. > > In general, i would believe that this is a problem in the NaviServer code, > but of the interplay of the various memory management options of OpenSSL, > NaviServer and Tcl. We use these functions under heavy load on many > servers, but we are careful to use everywhere the same malloc > implementation (actually Google's TCmalloc). > > OpenSSL: > ====== > > In general, OpenSSL supports configuration of management routines. > However, the memory management interface of OpenSSL changed with the > release of OpenSSL 1.1.0. As a consequence, when compiling NaviServer with > newer versions, of OpenSSL, the native OpenSSL memory routines are used. > The commit [1] says: "Registering our own functions does not seem > necessary". So, if one compiles a version of NaviServer between 4.99.15 and > 4.99.20 with newer versions of OpenSSL, there might a problem arise, when > the native OpenSSL malloc implementation is not full thread-safe, or when a > mix between different malloc implementation happens. > > NaviServer: > ======= > > When NaviServer is compiled with -DSYSTEM_MALLOC, ns_malloc() uses > malloc() etc., otherwise it uses Tcl's ckalloc() and friends. > > Tcl: > === > There exists as well a patch [2] for using internally in Tcl as well > system malloc instead of Tcl's own mt-threaded version. > > In Oct there was as well a small patch for NaviServer for cases, were Tcl > and NaviServer are compiled with different memory allocators [3]. > > My first attempt would be to compile NaviServer with SYSTEM_MALLOC and > check, whether you still experience a problem. The next recommendation > would be to check, what malloc versions are used by which subsystems and > align these if necessary. > > i will look into reviving the configuration of OpenSSL to allow to > configure its malloc implementation as it was possible before OpenSSL 1.1.0. > > -gn > > [1] > https://bitbucket.org/naviserver/naviserver/commits/896a4e3765f91b048ccbf570e5afe21b1bb1a41f > [2] https://github.com/gustafn/install-ns > [3] > https://bitbucket.org/naviserver/naviserver/commits/caab40365f0429a44740db1927e9f459d733db3f > On 14.12.20 18:07, David Osborne wrote: > > Hi, > > We're building some Naviserver instances (4.99.19) on Debian Buster > (v10.7). > One of the instances is a revproxy instance which uses connchans to speak > to a back end. > > We're seeing very frequent signal 11 crashes of NaviServer with this > combination. > (We also see this infrequently with 4.99.18 running on Debian Stretch (v9)) > > Because of the increased frequency I've managed to take a core dump and > the issue appears to be when calling SSL_CTX_new > after Ns_TLS_CtxClientCreate. > > I realise I don't have gdb properly configured, but wondering if the > backtrace as it is could shed any light on what's going on or is it still > too opaque? > > Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". > Core was generated by `/usr/lib/naviserver/bin/nsd -u nsd -g nsd -b > 0.0.0.0:80,0.0.0.0:443 -i -t /etc/'. > Program terminated with signal SIGABRT, Aborted. > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > [Current thread is 1 (Thread 0x7f4405ddf700 (LWP 13613))] > (gdb) bt > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 > #1 0x00007f4407936535 in __GI_abort () at abort.c:79 > #2 0x00007f440847cfe6 in Panic (fmt=<optimized out>) at log.c:928 > #3 0x00007f44080fbc4a in Tcl_PanicVA () from /lib/x86_64-linux-gnu/ > libtcl8.6.so > #4 0x00007f44080fbdb9 in Tcl_Panic () from /lib/x86_64-linux-gnu/ > libtcl8.6.so > #5 0x00007f44084bbc74 in Abort (signal=<optimized out>) at unix.c:1115 > #6 <signal handler called> > #7 malloc_consolidate (av=av@entry=0x7f43bc000020) at malloc.c:4486 > #8 0x00007f4407996a58 in _int_malloc (av=av@entry=0x7f43bc000020, > bytes=bytes@entry=1024) at malloc.c:3695 > #9 0x00007f440799856a in __GI___libc_malloc (bytes=1024) at malloc.c:3057 > #10 0x00007f4407c63559 in CRYPTO_zalloc () from > /lib/x86_64-linux-gnu/libcrypto.so.1.1 > #11 0x00007f4407df7699 in SSL_CTX_new () from > /lib/x86_64-linux-gnu/libssl.so.1.1 > #12 0x00007f44084b4d85 in Ns_TLS_CtxClientCreate > (interp=interp@entry=0x7f43bc009ee0, > cert=cert@entry=0x0, caFile=caFile@entry=0x0, caPath=caPath@entry=0x0, > verify=verify@entry=false, ctxPtr=ctxPtr@entry=0x7f4405dde7c0) at > tls.c:116 > #13 0x00007f44084687a4 in ConnChanOpenObjCmd (clientData=<optimized out>, > interp=0x7f43bc009ee0, objc=<optimized out>, objv=<optimized out>) > at connchan.c:1010 > #14 0x00007f44084a7eb8 in Ns_SubcmdObjv > (subcmdSpec=subcmdSpec@entry=0x7f4405dde990, > clientData=0x7f43bc047870, interp=0x7f43bc009ee0, objc=13, > objv=0x7f43bc017ff8) at tclobjv.c:1849 > #15 0x00007f4408469d45 in NsTclConnChanObjCmd (clientData=<optimized out>, > interp=<optimized out>, objc=<optimized out>, objv=<optimized out>) > at connchan.c:1761 > #16 0x00007f440802ffb7 in TclNRRunCallbacks () from /lib/x86_64-linux-gnu/ > libtcl8.6.so > #17 0x00007f44080313af in ?? () from /lib/x86_64-linux-gnu/libtcl8.6.so > #18 0x00007f4408030d13 in Tcl_EvalEx () from /lib/x86_64-linux-gnu/ > libtcl8.6.so > #19 0x00007f44084a9164 in NsTclFilterProc (arg=0x55af6a3e9880, > conn=0x55af6a502480, why=NS_FILTER_PRE_AUTH) at tclrequest.c:535 > #20 0x00007f4408478370 in NsRunFilters (conn=conn@entry=0x55af6a502480, > why=why@entry=NS_FILTER_PRE_AUTH) at filter.c:160 > #21 0x00007f440848654d in ConnRun (connPtr=connPtr@entry=0x55af6a502480) > at queue.c:2450 > #22 0x00007f4408485b33 in NsConnThread (arg=0x55af6a4a0090) at queue.c:2157 > #23 0x00007f44081b2bb1 in NsThreadMain (arg=0x55af6a354f50) at thread.c:230 > #24 0x00007f44081b3af9 in ThreadMain (arg=<optimized out>) at pthread.c:836 > #25 0x00007f44078f5fa3 in start_thread (arg=<optimized out>) at > pthread_create.c:486 > #26 0x00007f4407a0d4cf in clone () at > ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > -- > Regards, > David > > _______________________________________________ > naviserver-devel mailing list > naviserver-devel@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/naviserver-devel > -- *David Osborne | Software Engineer* Qcode Software, Castle House, Fairways Business Park, Inverness, IV2 6AA *Email:* da...@qcode.co.uk | *Phone:* 01463 896 484 www.qcode.co.uk
_______________________________________________ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel