BUG: Problem with latest subversion libapreq2 on linux 64 bit amd (segfault accessing handle variable)

Zero Altitude Wed, 07 May 2008 08:21:06 -0700

**********************
**********************

My question: how can I fix this bug, avoid this bug, or get a new
release of libapreq2 that solves this problem?  Thank you in advance.
Details follow.  We're willing to add new debugging to our code to
help out in debugging this.  Let me know if I can provide more info.


**********************
**********************
The problem:

With apache 2.2.8 with worker_mpm on linux 64 bit AMD (more OS details
dumped below):

libapreq2, checked out of subversion on May 6, 2008, and dating back
to at least a version from early 2007, has a consistent problem.  The
problem is that occasionally, the following code will cause apache to
segfault:

_________
static int do_post(request_rec *r)
{
   apreq_handle_t *handle;
   const apreq_module_t *module;
   apreq_param_t *filebody, *testfilebody;
   char *loopfilename, *myip, *disk, *tmpfile, *newfilename,
*extensionfilename, *lastdot;
   EVP_MD_CTX mdctx;
   const EVP_MD *md;
   unsigned char md_value[EVP_MAX_MD_SIZE];
   char *localorigfilename_raw, *localorigfilename;
   char null = '\0';
   apr_file_t *f = NULL;
   unsigned int md_len, copies, filesize = 0, any_processed = 0;
   apr_size_t bytes;
   int i, lresult = 0, isactivex = 0, isflash = 0, tmplen = 0;

   modbittybe_config *s_cfg =
ap_get_module_config(r->server->module_config, &bittybe_module);

   myip = s_cfg->myip;
   copies = s_cfg->copies;
   connect_to_fsdb_t(&m, s_cfg->dbip, s_cfg->dbport);

   handle = (apreq_handle_t *)apreq_handle_apache2(r);

   fprintf(stderr, "++++++++apache handle (3: %#lx)++++++++++\n", handle);
   fflush(stderr);

   module = handle->module;
_____________

At the last line, apache segfaults approximately .05% of the time (3
times a day or so on a thousand plus POST uploads).  You can download
the core dump here: http://abra.ms/apreq/core.5174.bz2.  Here is what
the core dump shows:

-------------- GDB SESSION ------------

[EMAIL PROTECTED]:~$ gdb /usr/local/apache20/bin/httpd
/tmp/apache2-gdb-dump/core.5174
.. symbol loading removed ...

(gdb) thread apply all bt full

.. many threads ignored -- failing thread shows ..

Thread 7 (process 5201):
#0  0x00002ada07239409 in kill () from /lib/libc.so.6
No symbol table info available.
#1  <signal handler called>
No symbol table info available.
#2  0x00002ada09aab7ea in do_post_put (r=0x2aaaaac089a8) at mod_bittybe.c:838
        retval = <value optimized out>
        ua = 0x2aaaaac0a1ac "Shockwave Flash"
        lenp = <value optimized out>
        clen = 3679200
#3  0x00000000004348c9 in ap_run_handler (r=0x2aaaaac089a8) at config.c:157
        n = 9
        rv = 121909408
#4  0x00000000004379ac in ap_invoke_handler (r=0x2aaaaac089a8) at config.c:372
        handler = 0x5ce950 "?VE"
        result = 0
        old_handler = 0x62f298 "bittyfs-be"
        ignore = <value optimized out>
#5  0x0000000000458318 in ap_process_request (r=0x2aaaaac089a8) at
http_request.c:258
        access_status = -1430251192
#6  0x00000000004557bc in ap_process_http_connection (c=0x6cc080) at
http_core.c:190
        r = (request_rec *) 0x2aaaaac089a8
        csd = (apr_socket_t *) 0x0
#7  0x000000000043b561 in ap_run_process_connection (c=0x6cc080) at
connection.c:43
        n = 0
        rv = 121909408
#8  0x000000000046e5c6 in worker_thread (thd=0x637800, dummy=<value
optimized out>) at worker.c:544
        process_slot = 3
        thread_slot = 25
        csd = (apr_socket_t *) 0x6cbe68
        bucket_alloc = (apr_bucket_alloc_t *) 0x2aaaaac0a948
        last_ptrans = <value optimized out>
        ptrans = (apr_pool_t *) 0x6cbde8
        rv = <value optimized out>
        is_idle = <value optimized out>
#9  0x00002ada06ff6f1a in start_thread () from /lib/libpthread.so.0
No symbol table info available.
#10 0x00002ada072d35d2 in clone () from /lib/libc.so.6
No symbol table info available.
#11 0x0000000000000000 in ?? ()
No symbol table info available.

## line 838 is the last line from above: module = handle->module;

.. more threads we don't care about ..

(gdb) thread 7
[Switching to thread 7 (process 5201)]#0  0x00002ada07239409 in kill
() from /lib/libc.so.6
(gdb) where
#0  0x00002ada07239409 in kill () from /lib/libc.so.6
#1  <signal handler called>
#2  0x00002ada09aab7ea in do_post_put (r=0x2aaaaac089a8) at mod_bittybe.c:838
#3  0x00000000004348c9 in ap_run_handler (r=0x2aaaaac089a8) at config.c:157
#4  0x00000000004379ac in ap_invoke_handler (r=0x2aaaaac089a8) at config.c:372
#5  0x0000000000458318 in ap_process_request (r=0x2aaaaac089a8) at
http_request.c:258
#6  0x00000000004557bc in ap_process_http_connection (c=0x6cc080) at
http_core.c:190
#7  0x000000000043b561 in ap_run_process_connection (c=0x6cc080) at
connection.c:43
#8  0x000000000046e5c6 in worker_thread (thd=0x637800, dummy=<value
optimized out>) at worker.c:544
#9  0x00002ada06ff6f1a in start_thread () from /lib/libpthread.so.0
#10 0x00002ada072d35d2 in clone () from /lib/libc.so.6
#11 0x0000000000000000 in ?? ()

## line 838 is the last line from above: module = handle->module;

(gdb) up
#1  <signal handler called>
(gdb) up
#2  0x00002ada09aab7ea in do_post_put (r=0x2aaaaac089a8) at mod_bittybe.c:838
838         module = handle->module;
(gdb) info registers
rax            0xffffffffaac01948       -1430251192
rbx            0x3823e0 3679200
rcx            0x0      0
rdx            0x2ada074430a0   47115913146528
rsi            0x0      0
rdi            0x0      0
rbp            0x619170 0x619170
rsp            0x4d819eb0       0x4d819eb0
r8             0xffffffff       4294967295
r9             0xffffffff       4294967295
r10            0x2aaaaac01988   46912497523080
r11            0x202    514
r12            0x62f298 6484632
r13            0x2aaaaac09fc8   46912497557448
r14            0x2aaaaac089a8   46912497551784
r15            0x2aaaaac0a1ac   46912497557932
rip            0x2ada09aab7ea   0x2ada09aab7ea <do_post_put+762>
eflags         0x10246  [ PF ZF IF RF ]
cs             0x33     51
ss             0x2b     43
ds             0x0      0
es             0x0      0
fs             0x0      0
gs             0x0      0
(gdb)

-------------- GDB SESSION END ------------

Note that RAX has a weird negative number.  That turns out to be the
memory location listed for handle.  Note the fprintf and fflush
debugging lines just before the problem?  Here are the relevant log
lines from error.log:

___ error.log ___

{UPLOAD-POST-(5174, 1300343136) (POST)
(/save/DEFAULT)(appserver=www.tabblo.com&djsession=upload.665586)}
++++++++connect to mysql in do_post (1)++++++++++
++++++++connected! (2)++++++++++
++++++++apache handle (3: 0xffffffffaac01948)++++++++++
[Tue May 06 23:42:26 2008] [notice] child pid 5174 exit signal
Segmentation fault (11), possible coredump in /tmp/apache2-gdb-dump

______________

Normally, handle has a fairly low memory address, e.g.

__error.log__
{UPLOAD-POST-(5207, 1208023392) (POST) (/save/DEFAULT)}
++++++++connect to mysql in do_post (1)++++++++++
++++++++connected! (2)++++++++++
++++++++apache handle (3: 0x6bcb88)++++++++++
__________

The segfault appears to be due to handle having an illegal-to-read
memory address by the time its module member is referenced.  I do not
appear to be doing anything untoward with respect to initializing
apreq, and so my current, defeasible, assumption is that the bug is
somewhere in the apreq library.

**********************
**********************

Some data:

$ uname -a
Linux g1t0241 2.6.18-6-amd64 #1 SMP Sun Feb 10 17:50:19 UTC 2008
x86_64 GNU/Linux

$ free
             total       used       free     shared    buffers     cached
Mem:       8179412    8119592      59820          0    1741940    1647724
-/+ buffers/cache:    4729928    3449484
Swap:     15976600        212   15976388

$ df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/mapper/localhost-root
                      30512072  13434752  15527396  47% /
tmpfs                  4089704         0   4089704   0% /lib/init/rw
udev                     10240       192     10048   2% /dev
tmpfs                  4089704        16   4089688   1% /dev/shm
/dev/cciss/c0d0p1        93307     13783     74707  16% /boot
/dev/mapper/localhost-home
                       8063408   4442740   3211068  59% /home
/dev/mapper/localhost-opt
                       8063408    380668   7273140   5% /opt
/dev/mapper/localhost-tmp
                       4031680    440880   3386000  12% /tmp
/dev/mapper/localhost-var
                       4031680    533076   3293804  14% /var
/dev/mapper/disk1    2112755696 740531332 1264902612  37% /mnt/disks/disk1
/dev/mapper/disk2    2112755696 779444608 1333311088  37% /mnt/disks/disk2
/dev/mapper/disk3    2112755696 778142276 1334613420  37% /mnt/disks/disk3
/dev/mapper/disk4    2112755696 782172644 1330583052  38% /mnt/disks/disk4

BUG: Problem with latest subversion libapreq2 on linux 64 bit amd (segfault accessing handle variable)

Reply via email to