Hello all,
Just to followup on myself...I did some more experiments and determined
that putting the alarm routines in a library wasn't really working right.
It would work correctly on the first request per child but after that
failed. I ended up with:
# failsafe to prevent broken children
$SIG{ALRM} = sub { die "$0: $$ Process Timed Out" };
$SIG{PIPE} = sub { die "$0: $$ Cancelled via SIGPIPE" };
$SIG{TERM} = sub { die "$0: $$ Cancelled via SIGTERM" };
alarm 10;
This is at the beginning of all my scripts using mod_perl. I'm tweaking
the timeout. Without using the alarm signal things still get nasty,
though.
Load on the box is back down to 1...However the timeout issue is a little
nasty. Users on slow connects are getting timed out rather than the
script finishing. I'm trying to determine where the hangs are happening
so I can reset the alarm timer in the script.
I still can't quite figure out why they are hanging in the first place,
I know it has to do with the clients disconnecting from the script
but I assumed that SIGPIPE would catch them. It isn't doing that in
all cases, I guess it depends where the disconnect occurs in the
script.
I hope this is helpful to others, but I still am looking for more
answers to this!
Thanks,
Jeremy
-Original Message-
From: Jeremy Rusnak [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, February 20, 2001 6:11 PM
To: [EMAIL PROTECTED]
Subject: Hanging processes (all of a sudden!)
Hi all,
I've run into a strange problem with mod_perl over the weekend. All of
a sudden, mod_perl started losing scope on it's connections to MySQL.
I'd eventually run out of connections to the MySQL server (currently
set up to limit to 200). This didn't make sense to me, since previously
the number of database connections was never more than the number of
MaxClients. With 40 httpd processes running, there were 100+ active
connections to the SQL server. Killing httpd removed the connections,
so there were definately multiple connections per child.
The only thing that had changed was me rebooting the SQL server, which
hadn't been rebooted in over six months. Since we're using SQL and NFS
from this box to the machine in question, I figured this might have
had some strange impact. Since then the load has skyrocketed from
an average of 1-2 to over 20-30. Of course at that load it eventually
dies.
Here's a graph of the load average on this box:
http://www3.igl.net/load/load.html (SMP P2 450mhx, 512MB ram, 2.0.36
kernel). [I know, I know - upgrade kernel - remotely hosted so it's
a pain to do a major update].
I was running mod_perl v1.22, so I went and upgraded. Getting that up
and running with the latest Apache, using Apache::DBI has solved the
problem of the ridiculously large SQL connections, but now I'm having
problems with hanging processes.
Apparently when a child process doesn't finish properly it is still
running on and on in memory. Apache reports it's state as W, but
top shows it running and gobbling up resources. I poured over the
mailing list archives and implemented some of the SIG handling
recommendations. I can reproduce the effect by hitting stop in my
browser, so I thought I had found a good solution.
Here's what I am doing now (just relevent info) in httpd.conf:
MaxClients 40
PerlFixupHandler Apache::SIG
PerlModule Apache::DBI
In each mod_perl script I am doing:
# failsafe to prevent broken children
require "alarm.pl";
alarm.pl consists of:
$SIG{ALRM} = sub { die "$0: $$ Process Timed Out" };
$SIG{PIPE} = sub { die "$0: $$ Cancelled via SIGPIPE" };
$SIG{TERM} = sub { die "$0: $$ Cancelled via SIGTERM" };
alarm 10;
I've written a test script that I can reproduce the hanging process
problem with, and the above method seems to kill it. However, in
real world usage I'm still seeing scripts hang! I can't for the
life of me figure out why.
I've been using mod_perl for well over a year and a half now,
it's running on it's own port and only serves mod_perl requests.
The weirdness of this is the sudden onset of the problem, when
no scripts or configuration had been changed on the box hosting
it.
I thought perhaps it could be a slowdown on the SQL side, but
running test queries delivers FASTER benchmarks than a few months
ago. I even upgraded MySQL, DBI, and DBM::Mysql to the latest
stable releases to be safe. Traffic overall on the site is actually
DOWN from several months ago. The load is definately being caused
by hanging processes.
Any advice or pointers? I've been searching for the last three
days for any scraps of information I can find, but I'm still
running into a wall here.
Thanks,
Jeremy