Re: UPDATE: Hanging processes (all of a sudden!)

2001-02-20 Thread Stas Bekman

On Tue, 20 Feb 2001, Jeremy Rusnak wrote:

> Hello all,
>
> Just to followup on myself...I did some more experiments and determined
> that putting the alarm routines in a library wasn't really working right.
> It would work correctly on the first request per child but after that
> failed.  I ended up with:
>
> # failsafe to prevent broken children
> $SIG{ALRM} = sub { die "$0: $$ Process Timed Out" };
> $SIG{PIPE} = sub { die "$0: $$ Cancelled via SIGPIPE" };
> $SIG{TERM} = sub { die "$0: $$ Cancelled via SIGTERM" };
> alarm 10;

That's how SIG overriding works. After the first time a sig is caught --
the handler gets reset to a DEFAULT. What you need to do is to re-assign
the handler at the end of the handler itself:

$SIG{TERM} = &handler;
sub handler{ die "$0: $$ Process Timed Out"; $SIG{TERM} = &handler; }

> This is at the beginning of all my scripts using mod_perl.   I'm tweaking
> the timeout.  Without using the alarm signal things still get nasty,
> though.
>
> Load on the box is back down to 1...However the timeout issue is a little
> nasty.  Users on slow connects are getting timed out rather than the
> script finishing.  I'm trying to determine where the hangs are happening
> so I can reset the alarm timer in the script.

Also check out the Apache::Watchdog::RunAway app.

> I still can't quite figure out why they are hanging in the first place,
> I know it has to do with the clients disconnecting from the script
> but I assumed that SIGPIPE would catch them.  It isn't doing that in
> all cases, I guess it depends where the disconnect occurs in the
> script.

The guide has at least two section about catching these:
http://perl.apache.org/guide/debug.html#Handling_the_User_pressed_Stop_
http://perl.apache.org/guide/debug.html#Hanging_Processes_Detection_and


> I hope this is helpful to others, but I still am looking for more
> answers to this!



_
Stas Bekman  JAm_pH --   Just Another mod_perl Hacker
http://stason.org/   mod_perl Guide  http://perl.apache.org/guide
mailto:[EMAIL PROTECTED]   http://apachetoday.com http://logilune.com/
http://singlesheaven.com http://perl.apache.org http://perlmonth.com/





UPDATE: Hanging processes (all of a sudden!)

2001-02-20 Thread Jeremy Rusnak

Hello all,

Just to followup on myself...I did some more experiments and determined
that putting the alarm routines in a library wasn't really working right.
It would work correctly on the first request per child but after that
failed.  I ended up with:

# failsafe to prevent broken children
$SIG{ALRM} = sub { die "$0: $$ Process Timed Out" }; 
$SIG{PIPE} = sub { die "$0: $$ Cancelled via SIGPIPE" }; 
$SIG{TERM} = sub { die "$0: $$ Cancelled via SIGTERM" }; 
alarm 10;

This is at the beginning of all my scripts using mod_perl.   I'm tweaking
the timeout.  Without using the alarm signal things still get nasty,
though.  

Load on the box is back down to 1...However the timeout issue is a little
nasty.  Users on slow connects are getting timed out rather than the
script finishing.  I'm trying to determine where the hangs are happening
so I can reset the alarm timer in the script.

I still can't quite figure out why they are hanging in the first place,
I know it has to do with the clients disconnecting from the script
but I assumed that SIGPIPE would catch them.  It isn't doing that in
all cases, I guess it depends where the disconnect occurs in the
script.

I hope this is helpful to others, but I still am looking for more 
answers to this!

Thanks,
Jeremy

-Original Message-
From: Jeremy Rusnak [mailto:[EMAIL PROTECTED]]
Sent: Tuesday, February 20, 2001 6:11 PM
To: [EMAIL PROTECTED]
Subject: Hanging processes (all of a sudden!)


Hi all,

I've run into a strange problem with mod_perl over the weekend.  All of
a sudden, mod_perl started losing scope on it's connections to MySQL.

I'd eventually run out of connections to the MySQL server (currently
set up to limit to 200).  This didn't make sense to me, since previously
the number of database connections was never more than the number of
MaxClients.  With 40 httpd processes running, there were 100+ active
connections to the SQL server.  Killing httpd removed the connections,
so there were definately multiple connections per child.

The only thing that had changed was me rebooting the SQL server, which
hadn't been rebooted in over six months.  Since we're using SQL and NFS
from this box to the machine in question, I figured this might have
had some strange impact.  Since then the load has skyrocketed from
an average of 1-2 to over 20-30.  Of course at that load it eventually
dies.

Here's a graph of the load average on this box:
http://www3.igl.net/load/load.html  (SMP P2 450mhx, 512MB ram, 2.0.36
kernel).  [I know, I know - upgrade kernel - remotely hosted so it's
a pain to do a major update].

I was running mod_perl v1.22, so I went and upgraded.  Getting that up
and running with the latest Apache, using Apache::DBI has solved the
problem of the ridiculously large SQL connections, but now I'm having
problems with hanging processes.

Apparently when a child process doesn't finish properly it is still
running on and on in memory.  Apache reports it's state as W, but
top shows it running and gobbling up resources.  I poured over the
mailing list archives and implemented some of the SIG handling
recommendations.  I can reproduce the effect by hitting stop in my
browser, so I thought I had found a good solution.

Here's what I am doing now (just relevent info) in httpd.conf:

MaxClients 40
PerlFixupHandler Apache::SIG
PerlModule Apache::DBI

In each mod_perl script I am doing:

# failsafe to prevent broken children
require "alarm.pl";

alarm.pl consists of:

$SIG{ALRM} = sub { die "$0: $$ Process Timed Out" }; 
$SIG{PIPE} = sub { die "$0: $$ Cancelled via SIGPIPE" }; 
$SIG{TERM} = sub { die "$0: $$ Cancelled via SIGTERM" }; 
alarm 10; 

I've written a test script that I can reproduce the hanging process
problem with, and the above method seems to kill it.  However, in
real world usage I'm still seeing scripts hang!  I can't for the
life of me figure out why.

I've been using mod_perl for well over a year and a half now,
it's running on it's own port and only serves mod_perl requests.
The weirdness of this is the sudden onset of the problem, when
no scripts or configuration had been changed on the box hosting
it.

I thought perhaps it could be a slowdown on the SQL side, but
running test queries delivers FASTER benchmarks than a few months
ago.  I even upgraded MySQL, DBI, and DBM::Mysql to the latest
stable releases to be safe.  Traffic overall on the site is actually
DOWN from several months ago.  The load is definately being caused
by hanging processes.

Any advice or pointers?  I've been searching for the last three
days for any scraps of information I can find, but I'm still
running into a wall here.

Thanks,
Jeremy