On Tue, May 30, 2006, Steffen Weinreich wrote: > [...] > > What you can try is the combination of debug symbols _PLUS_ optimization > > flags? Yes, you hear correctly: GCC is smart enough to allow things like > > "-O2 -g". You have to hack this into the specfile manually. And the > > result is often a still rather obscure debugging experience, but better > > than nothing. It at least hopefully allows to find the _function_ inside > > Apache/mod_ssl/OpenSSL which causes the problem plus a stack backtrace. > > With this information one then can splice a few printf()'s into the > > sources to find the real problem even without the debugger... > > OK, I have tried it on rm9 and the problem is still reproducible. Now I > got a SIGSEGV instead of a SIGBUS. Here is the backtrace I got using the > gdb: > > bash-3.00# /usr/opkg/bin/gdb /openpkg-dev/sbin/apache > GNU gdb 6.3 > Copyright 2004 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "sparc-sun-solaris2.10"... > (gdb) run -X > Starting program: /d1/openpkg-dev/sbin/apache -X > warning: Lowest section in /lib/libpthread.so.1 is .dynamic at 00000074 > Processing config directory: /openpkg-dev/etc/apache/apache.d/*.conf > ^A > Program received signal SIGSEGV, Segmentation fault. > ssl_scache_shmcb_store (s=0xff3a2000, id=0x0, idlen=Variable "idlen" is > not available. > ) at ssl_scache_shmcb.c:295 > 295 memset(ptr, 0, size); > (gdb) bt > #0 ssl_scache_shmcb_store (s=0xff3a2000, id=0x0, idlen=Variable "idlen" > is not available. > ) at ssl_scache_shmcb.c:295 > #1 0x00016f20 in ssl_callback_NewSessionCacheEntry (ssl=0xffbff270, > pNew=0x210420) at ssl_engine_kernel.c:1804
The above functions are callbacks from mod_ssl. > #2 0x00088054 in ssl_update_cache () > #3 0x00097c24 in ssl3_accept () > #4 0x00089e04 in SSL_accept () > #5 0x00083b4c in ssl23_get_client_hello () > #6 0x000842fc in ssl23_accept () > #7 0x00089e40 in SSL_accept () These above functions are from OpenSSL. > #8 0x00018c38 in ssl_hook_NewConnection (conn=0x244350) at > ssl_engine_kernel.c:236 This above function is from mod_ssl. > #9 0x000605b0 in new_connection (p=0x22a448, server=0x13ac00, > inout=0x1a6800, remaddr=0x13ac00, saddr=0x22c478, child_num=1289216) at > http_main.c:3714 > #10 0x00061190 in child_main (child_num_arg=Variable "child_num_arg" is > not available. > ) at http_main.c:4876 > #11 0x000617e0 in make_child (s=0x1a1400, slot=1718272, now=1718272) at > http_main.c:5021 > #12 0x000618e0 in startup_children (number_to_start=1739848) at > http_main.c:5103 > #13 0x00062658 in standalone_main (argc=4, argv=Variable "argv" is not > available. > ) at http_main.c:5435 > #14 0x00063470 in main (argc=2, argv=0x1a3800) at http_main.c:5792 > (gdb) And these above functions are from Apache. So, Apache processes the request and calls into mod_ssl which in turn calls into OpenSSL which in turn calls back to mod_ssl for updating the SHMCB type session cache and there it segfaults. Hmmm... interesting. Unfortunately (as expected because of "-O" in combination with "-g") the backtrace is not really 100% correct and reliable. But it at least shows in which _area_ the problem exists. My experience tells me that here we have two problem roots: the SHMCB type session cache is a horribly piece of rather complex C code which especially fiddles around a lot with C _casts_. Perhaps an invalid cast is performed there (from a type to another one with different storage sizes and where information is lost) and the above segfault is just the visible side-effect. The other source of the problem could be a different problem (like a compiler bug) which smashes some variables or even the stack somewhere else and mod_ssl's session cache is just the bereaved. As you certainly cannot reliably inspect variables in the debugger under "-O2 -g", the next step of advice is to retry with a different mod_ssl session cache type. For this try in order the following directives: 1. SSLSessionCache shmht:[...filename...](512000) 2. SSLSessionCache dbm:[...filename...] 3. SSLSessionCache none If the problem goes away already with (1) the SHMCB code is the problem. If the problem goes away with (2) I would be surprised. If the problem goes away with (3) the session cache framework is the problem and would be surprised even more. I guess the problem either goes already away after (1) or persists even after (3). Ralf S. Engelschall [EMAIL PROTECTED] www.engelschall.com ______________________________________________________________________ The OpenPKG Project www.openpkg.org Developer Communication List openpkg-dev@openpkg.org