On Tue, May 30, 2006, Steffen Weinreich wrote:

> [...]
> > What you can try is the combination of debug symbols _PLUS_ optimization
> > flags? Yes, you hear correctly: GCC is smart enough to allow things like
> > "-O2 -g". You have to hack this into the specfile manually. And the
> > result is often a still rather obscure debugging experience, but better
> > than nothing. It at least hopefully allows to find the _function_ inside
> > Apache/mod_ssl/OpenSSL which causes the problem plus a stack backtrace.
> > With this information one then can splice a few printf()'s into the
> > sources to find the real problem even without the debugger...
>
> OK, I have tried it on rm9 and the problem is still reproducible. Now I
> got a SIGSEGV instead of a SIGBUS. Here is the backtrace I got using the
> gdb:
>
> bash-3.00# /usr/opkg/bin/gdb /openpkg-dev/sbin/apache
> GNU gdb 6.3
> Copyright 2004 Free Software Foundation, Inc.
> GDB is free software, covered by the GNU General Public License, and you are
> welcome to change it and/or distribute copies of it under certain
> conditions.
> Type "show copying" to see the conditions.
> There is absolutely no warranty for GDB.  Type "show warranty" for details.
> This GDB was configured as "sparc-sun-solaris2.10"...
> (gdb) run -X
> Starting program: /d1/openpkg-dev/sbin/apache -X
> warning: Lowest section in /lib/libpthread.so.1 is .dynamic at 00000074
> Processing config directory: /openpkg-dev/etc/apache/apache.d/*.conf
> ^A
> Program received signal SIGSEGV, Segmentation fault.
> ssl_scache_shmcb_store (s=0xff3a2000, id=0x0, idlen=Variable "idlen" is
> not available.
> ) at ssl_scache_shmcb.c:295
> 295             memset(ptr, 0, size);
> (gdb) bt
> #0  ssl_scache_shmcb_store (s=0xff3a2000, id=0x0, idlen=Variable "idlen"
> is not available.
> ) at ssl_scache_shmcb.c:295
> #1  0x00016f20 in ssl_callback_NewSessionCacheEntry (ssl=0xffbff270,
> pNew=0x210420) at ssl_engine_kernel.c:1804

The above functions are callbacks from mod_ssl.

> #2  0x00088054 in ssl_update_cache ()
> #3  0x00097c24 in ssl3_accept ()
> #4  0x00089e04 in SSL_accept ()
> #5  0x00083b4c in ssl23_get_client_hello ()
> #6  0x000842fc in ssl23_accept ()
> #7  0x00089e40 in SSL_accept ()

These above functions are from OpenSSL.

> #8  0x00018c38 in ssl_hook_NewConnection (conn=0x244350) at
> ssl_engine_kernel.c:236

This above function is from mod_ssl.

> #9  0x000605b0 in new_connection (p=0x22a448, server=0x13ac00,
> inout=0x1a6800, remaddr=0x13ac00, saddr=0x22c478, child_num=1289216) at
> http_main.c:3714
> #10 0x00061190 in child_main (child_num_arg=Variable "child_num_arg" is
> not available.
> ) at http_main.c:4876
> #11 0x000617e0 in make_child (s=0x1a1400, slot=1718272, now=1718272) at
> http_main.c:5021
> #12 0x000618e0 in startup_children (number_to_start=1739848) at
> http_main.c:5103
> #13 0x00062658 in standalone_main (argc=4, argv=Variable "argv" is not
> available.
> ) at http_main.c:5435
> #14 0x00063470 in main (argc=2, argv=0x1a3800) at http_main.c:5792
> (gdb)

And these above functions are from Apache.

So, Apache processes the request and calls into mod_ssl which in turn
calls into OpenSSL which in turn calls back to mod_ssl for updating the
SHMCB type session cache and there it segfaults. Hmmm... interesting.
Unfortunately (as expected because of "-O" in combination with "-g") the
backtrace is not really 100% correct and reliable. But it at least shows
in which _area_ the problem exists.

My experience tells me that here we have two problem roots: the SHMCB
type session cache is a horribly piece of rather complex C code which
especially fiddles around a lot with C _casts_. Perhaps an invalid cast
is performed there (from a type to another one with different storage
sizes and where information is lost) and the above segfault is just
the visible side-effect. The other source of the problem could be a
different problem (like a compiler bug) which smashes some variables or
even the stack somewhere else and mod_ssl's session cache is just the
bereaved.

As you certainly cannot reliably inspect variables in the debugger under
"-O2 -g", the next step of advice is to retry with a different mod_ssl
session cache type. For this try in order the following directives:

    1. SSLSessionCache  shmht:[...filename...](512000)
    2. SSLSessionCache  dbm:[...filename...]
    3. SSLSessionCache  none

If the problem goes away already with (1) the SHMCB code is the problem.
If the problem goes away with (2) I would be surprised. If the problem
goes away with (3) the session cache framework is the problem and would
be surprised even more. I guess the problem either goes already away
after (1) or persists even after (3).

                                       Ralf S. Engelschall
                                       [EMAIL PROTECTED]
                                       www.engelschall.com

______________________________________________________________________
The OpenPKG Project                                    www.openpkg.org
Developer Communication List                   openpkg-dev@openpkg.org

Reply via email to