Hi there people,
I have inherited a web server running mod_perl and I am experiencing a
somewhat critical problem: http processes sometimes get into an infinite
loop, using 100% cpu time, and given enough time bring the machine to a
halt.
I've done a lot of testing, and there isn't a specific http request that
triggers the behaviour, eventhough it always happens after a request. It
seems to happen every few hours: the httpd process simply starts hogging
up the CPU, and won't let go of it. After a while, I have a few of these
processes running, and the machine's load average skyrockets. Sometimes
it's bad enough I'm not even able to log in via console.
I'll upgrade all the software to new versions, but apparently this
problem has been ocurring for a while, and survived a couple of
hardware/software upgrades. I'll also be rewriting the perl code running
there to see if it stops the problem (the code isn't too clean - lots of
global variables, not written under strict, etc, but "it works").
However, it would be cool if someone could enlighten me on what's going
on, and possibly suggest a fix :).
Thanks a lot!
signed,
gustavo
<begin debugging info>
=> our OS is:
[root@blueland /root]# uname -a
Linux blueland 2.2.14-5.0 #4 Wed Apr 12 20:28:28 MDT 2000 i586 unknown
=> Apache:
Server Version: Apache/1.3.6 (Unix) mod_perl/1.19 mod_ssl/2.2.8
OpenSSL/0.9.2b
=> let's look into one of the monster processes:
497 ? R 288:06 /usr/local/apache_1.3.6/bin/httpd
=> (nice cpu time there...)
=> now for gdb...
[root@blueland /root]# gdb /usr/local/apache/bin/httpd 497
GNU gdb 4.18
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for
details.
Attaching to program: /usr/local/apache/bin/httpd, Pid 497
Reading symbols from /lib/libNoVersion.so.1...done.
Reading symbols from /lib/libm.so.6...done.
Reading symbols from /lib/libcrypt.so.1...done.
Reading symbols from
/usr/lib/perl5/5.00502/i586-linux/CORE/libperl.so...done.
Reading symbols from /lib/libnsl.so.1...done.
Reading symbols from /lib/libdl.so.2...done.
Reading symbols from /lib/libc.so.6...done.
Reading symbols from /lib/ld-linux.so.2...done.
Reading symbols from /lib/libnss_files.so.2...done.
Reading symbols from
/usr/lib/perl5/site_perl/5.005/i586-linux/auto/Sybase/DBlib/DBlib.none...done.
Reading symbols from /opt/sybase/lib/libsybdb.so...done.
Reading symbols from /opt/sybase/lib/libinsck.so...done.
Reading symbols from /lib/libnss_nisplus.so.2...done.
Reading symbols from /lib/libnss_nis.so.2...done.
0x80a7407 in free_blocks ()
=> let's see what the stack tells us...
(gdb) i s
#0 0x80a7407 in free_blocks ()
#1 0x80a7660 in ap_clear_pool ()
#2 0x80a76a1 in ap_destroy_pool ()
#3 0x80be71a in ap_destroy_sub_req ()
#4 0x8070d8a in XS_Apache__SubRequest_DESTROY ()
#5 0x400d0f45 in Perl_pp_entersub () at pp_hot.c:1965
#6 0x4007aa8d in perl_call_sv () at perl.c:1008
#7 0x400d9d4c in Perl_sv_clear () at sv.c:2418
#8 0x400da451 in Perl_sv_free () at sv.c:2418
#9 0x400d24ab in do_clean_objs (sv=0x8385744) at sv.c:338
#10 0x400d237c in visit (f=0x400d2410 <do_clean_objs>) at sv.c:306
#11 0x400d263f in Perl_sv_clean_objs () at sv.c:359
#12 0x40077fa5 in perl_destruct () at perl.c:1008
#13 0x80635a3 in perl_shutdown ()
#14 0x80645c5 in perl_child_exit ()
#15 0x8064430 in perl_child_exit_cleanup ()
#16 0x80a8dd6 in run_cleanups ()
#17 0x80a762c in ap_clear_pool ()
#18 0x80a76a1 in ap_destroy_pool ()
#19 0x80b3b03 in clean_child_exit ()
#20 0x80b6773 in child_main ()
#21 0x80b6e14 in make_child ()
#22 0x80b716e in perform_idle_server_maintenance ()
#23 0x80b7665 in standalone_main ()
#24 0x80b7cd3 in main ()
=> registers have:
(gdb) i r
eax 0x8a3e408 144958472
ecx 0x8608414 140542996
edx 0x8a3e408 144958472
ebx 0x8239208 136548872
esp 0xbffff90c 0xbffff90c
ebp 0xbffff910 0xbffff910
esi 0x1 1
edi 0xffffffff -1
eip 0x80a7407 0x80a7407
eflags 0x202 514
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x0 0
=> this is a dump of the running function
(gdb) disas
Dump of assembler code for function free_blocks:
0x80a73e0 <free_blocks>: push %ebp
0x80a73e1 <free_blocks+1>: mov %esp,%ebp
0x80a73e3 <free_blocks+3>: sub $0x4,%esp
0x80a73e6 <free_blocks+6>: cmpl $0x0,0x8(%ebp)
0x80a73ea <free_blocks+10>: jne 0x80a73f0 <free_blocks+16>
0x80a73ec <free_blocks+12>: jmp 0x80a7439 <free_blocks+89>
0x80a73ee <free_blocks+14>: mov %esi,%esi
0x80a73f0 <free_blocks+16>: mov 0x8146b88,%eax
0x80a73f5 <free_blocks+21>: mov %eax,0xfffffffc(%ebp)
0x80a73f8 <free_blocks+24>: mov 0x8(%ebp),%eax
0x80a73fb <free_blocks+27>: mov %eax,0x8146b88
0x80a7400 <free_blocks+32>: mov 0x8(%ebp),%eax
0x80a7403 <free_blocks+35>: cmpl $0x0,0x4(%eax)
0x80a7407 <free_blocks+39>: jne 0x80a740c <free_blocks+44>
0x80a7409 <free_blocks+41>: jmp 0x80a7424 <free_blocks+68>
0x80a740b <free_blocks+43>: nop
0x80a740c <free_blocks+44>: mov 0x8(%ebp),%eax
0x80a740f <free_blocks+47>: mov 0x8(%ebp),%ecx
0x80a7412 <free_blocks+50>: add $0xc,%ecx
0x80a7415 <free_blocks+53>: mov %ecx,0x8(%eax)
0x80a7418 <free_blocks+56>: mov 0x8(%ebp),%eax
0x80a741b <free_blocks+59>: mov 0x4(%eax),%edx
0x80a741e <free_blocks+62>: mov %edx,0x8(%ebp)
0x80a7421 <free_blocks+65>: jmp 0x80a7400 <free_blocks+32>
0x80a7423 <free_blocks+67>: nop
0x80a7424 <free_blocks+68>: mov 0x8(%ebp),%eax
0x80a7427 <free_blocks+71>: mov 0x8(%ebp),%ecx
0x80a742a <free_blocks+74>: add $0xc,%ecx
0x80a742d <free_blocks+77>: mov %ecx,0x8(%eax)
0x80a7430 <free_blocks+80>: mov 0x8(%ebp),%eax
0x80a7433 <free_blocks+83>: mov 0xfffffffc(%ebp),%edx
0x80a7436 <free_blocks+86>: mov %edx,0x4(%eax)
0x80a7439 <free_blocks+89>: leave
0x80a743a <free_blocks+90>: ret
=> strace doesn't show any system calls going on, which makes sense,
since we don't ever leave free_blocks()
.EOF.