Re: Hanging process: detection and determination (was Re: Runaway processes)
> My watch dog code is a bit of a mess, but it uses LWP::*, > or lwp-request to query the server, test for expected > output, and if it takes too long restart the server. If > too many subsequent restarts fail to work, and apache is > still not responding in a timely manner, send an > email to the administrator. > > Some code is posted below that you can adapt for you use. Wow! That's quite a script :) I'm using some simpler one (only for httpd), see the second watchdog at: http://perl.apache.org/guide/control.html#Monitoring_the_Server_A_watchdo [code snipped] ___ Stas Bekman mailto:[EMAIL PROTECTED]www.singlesheaven.com/stas Perl,CGI,Apache,Linux,Web,Java,PC at www.singlesheaven.com/stas/TULARC www.apache.org & www.perl.com == www.modperl.com || perl.apache.org single o-> + single o-+ = singlesheavenhttp://www.singlesheaven.com
Re: Hanging process: detection and determination (was Re: Runaway processes)
Remi Fasol wrote: > > hi Joshua, > > is this your recommended setup when using Apache::ASP > or is this for mod_perl in general? > mod_perl in general. > if it's for Apache::ASP, do you have a sample CPU > limit script and/or watchdog? > Check out Apache::Resource, the CPU limiting feature is well documented there. You need to install also BSD::Resource on your system. My watch dog code is a bit of a mess, but it uses LWP::*, or lwp-request to query the server, test for expected output, and if it takes too long restart the server. If too many subsequent restarts fail to work, and apache is still not responding in a timely manner, send an email to the administrator. Some code is posted below that you can adapt for you use. --Joshua _ Joshua Chamas Chamas Enterprises Inc. NODEWORKS >> free web link monitoring Huntington Beach, CA USA http://www.nodeworks.com1-714-625-4051 ## main watchdog, you can add other testing modules, I test ## web server, database, dns, smtp with this. use Util; use My::UserAgent; # use LWP::Debug qw(+); my $ua = new My::UserAgent; my $self = new Util::Monitor; $self->add_run(bless { 'name' => 'proxy homepages test', # make sure both the secure and non-secure pages are working 'test' => sub { my $http_doc = $ua->Request('http://www.nodeworks.com'); my $https_doc = $ua->Request('https://www.nodeworks.com'); ($http_doc->{success} && $https_doc->{success}); }, # restart www server gracefully, works when server isn't started # all too 'fix' => sub { $self->log("attempting restart of www server"); `/usr/local/apache/sbin/apachectl graceful`; }, # allow for downtime of one - two minutes before sending page # it may take a minute to reboot server when the machine is busy 'period' => 30, 'max_tests' => 3, 'timeout' => 30, }, Util::Monitor::Run ); $self->monitor; ## from Util.pm package Util::Monitor; @ISA = qw(Util); use Class::Struct; use File::Basename; use Carp qw(confess cluck carp); use HTTP::Date; use Net::SMTP; use Net::Config; use Data::Dumper; use Tie::CPHash; use File::Basename; use Time::localtime; @Mandatory = ('name', 'test', 'period'); $MaxSleep = 60; $DefaultMaxTests = 3; unless(keys %Util::Monitor::Run::) { struct(Util::Monitor::Run => { 'name' => "\$", # string describing test 'test' => "\$", # CODE 'fix' => "\$", # CODE 'period' => "\$", # time in seconds to iterate 'max_tests' => "\$", # number of times before erroring 'num_tests' => "\$", # number of times before erroring 'last' => "\$", # time last ran 'timeout' => "\$", # timeout for test }); } sub new { chdir(File::Basename::dirname($0)) || die("can't change to dir for $0"); my $self = bless { runs => [] }; $self->write_pidfile; $self; } # add runs before monitoring sub add_run { my($self, $run) = @_; die("no run") unless $run; die("run is not well defined") unless (@Mandatory == grep(defined $run->{$_}, @Mandatory)); $run->max_tests || $run->max_tests($DefaultMaxTests); $run->last(0); $run->num_tests(0); push(@{$self->{runs}}, $run); } # main code to loop over sub monitor { my $self = shift; while($self->alive) { $self->do_runs; $self->sleep; } } sub sleep { my $self = shift; my $next_time = time() + $MaxSleep; for(@{$self->{runs}}) { my $run_time = $_->period + $_->last; if($run_time < $next_time) { $next_time = $run_time; } } my $sleep_time = $next_time - time; if($sleep_time > 0) { $self->log("sleeping $sleep_time"); sleep($sleep_time); } else { $sleep_time = 0; } $sleep_time; } sub do_runs { my $self = shift; @{$self->{runs}} > 0 or die("no runs to do"); my $run; for $run (@{$self->{runs}}) { my $run_time = $run->period + $run->last; next unless ($run_time <= time); $self->log("doing run name ".$run->name); $self->do_run($run); $run->last(time); } } sub do_run { my($self, $run) = @_; my $name = $run->name; my $start = time(); my $result = $self->try($run->test, $run->timeout); my $total = time - $start; $self->log("time for $name: $total"); if($result) { # test succeeded $self->log("test success for $name, result $result"); if($run->num_tests) { $self->sendmail({ Subject => $run->name . " fixed", Body => "failed t
Re: Hanging process: detection and determination (was Re: Runaway processes)
> I use Apache::Resource to set a CPU limit, that only a > runaway process would hit so the random killer process > doesn't accumulate and take down my system. I have > MaxRequestsPerChild set to a few hundred and have found > empirically that they don't tend to take more than 10 > seconds of CPU time for normal use, so I give a CPU > limit of 20-30 seconds for all my httpds. So you use the formula: total_proc_cpu_time_limit = MaxRequestsPerChild * single_request_cpu_time_limit Hmm, you describe a workable solution... But it can be very problematic to determine the limit numbers for the above formula, if the environment tend to change. I mean, when you add/remove scripts, add features... $detection_solutions++ :) Anyone else? > I also run a monitor program that watchdogs the > server every 20-30 seconds and restarts it if > response time is ever too low, just in case other > odd things go wrong. It just does a graceful > restart, I haven't needed to fix a problem with a > full stop / start yet. Yup, I do the same. My watchdog also emails a report to myself when this happens, so I can monitor the whole thing and spot problems. (see the guide for the watchdog). But unfortunately this cannot spot that just a few processes hang. It would only work, when hanging_procs = MaxClients, so parent process wouldn't spawn any more procs and the watchdog would detect and restart the server, killing all the hanging procs... Thank you, Joshua ___ Stas Bekman mailto:[EMAIL PROTECTED]www.singlesheaven.com/stas Perl,CGI,Apache,Linux,Web,Java,PC at www.singlesheaven.com/stas/TULARC www.apache.org & www.perl.com == www.modperl.com || perl.apache.org single o-> + single o-+ = singlesheavenhttp://www.singlesheaven.com
Re: Hanging process: detection and determination (was Re: Runaway processes)
hi Joshua, is this your recommended setup when using Apache::ASP or is this for mod_perl in general? if it's for Apache::ASP, do you have a sample CPU limit script and/or watchdog? thanks! remi --- Joshua Chamas <[EMAIL PROTECTED]> wrote: > Stas, > > I use Apache::Resource to set a CPU limit, that only > a > runaway process would hit so the random killer > process > doesn't accumulate and take down my system. I have > MaxRequestsPerChild set to a few hundred and have > found > empirically that they don't tend to take more than > 10 > seconds of CPU time for normal use, so I give a CPU > limit of 20-30 seconds for all my httpds. > > I also run a monitor program that watchdogs the > server every 20-30 seconds and restarts it if > response time is ever too low, just in case other > odd things go wrong. It just does a graceful > restart, I haven't needed to fix a problem with a > full stop / start yet. > = __ Do You Yahoo!? Bid and sell for free at http://auctions.yahoo.com
Re: Hanging process: detection and determination (was Re: Runaway processes)
> The reason why the stop button does not stop the script lies in the fact > that you're script does not produce any output while it is running. SIGPIPE > is only raised when your script tries to write to a closed (STOPed) > connection. No output from your script = no SIGPIPE! That's right, Tobias. I've checked the Apache::SIG and $r->connection->aborted, but is there a way to "write" without actually writing, probably some control char will do? Something like: while(1){ $r->print("\0"); last if $r->connection->aborted; $i++; sleep (1); } I guess you must flush it as well, otherwise it would be cached... so either $|++ or $r->rflush, this one seems to work: while(1){ $r->print("\0"); $r->rflush; last if $r->connection->aborted; $i++; sleep (1); } but this one doesn't work (removed "last if $r->connection->aborted"). Which seems that makes mod_perl broken while(1){ $r->print("$$\n"); $r->rflush; $i++; sleep (1); } See the output of strace, when I press Stop - it detects the SIGPIPE but doesn't quit! [snip] nanosleep(0xb308, 0xb308, 0x401a61b4, 0xb308, 0xb41c) = 0 time([940621341]) = 940621341 write(4, "22572\n", 6) = -1 EPIPE (Broken pipe) --- SIGPIPE (Broken pipe) --- time([940621341]) = 940621341 SYS_175(0, 0xb41c, 0xb39c, 0x8, 0) = 0 SYS_174(0x11, 0, 0xb1a0, 0x8, 0x11) = 0 SYS_175(0x2, 0xb39c, 0, 0x8, 0x2) = 0 nanosleep(0xb308, 0xb308, 0x401a61b4, 0xb308, 0xb41c) = 0 [snip] continues non-stop here So Apache::SIG doesn't set correctly the mod_perl's default behaviour, since when I add: use Apache::SIG (); Apache::SIG->set; It stops right away after I press the Stop button. I run Apache/1.3.10-dev mod_perl/1.22-dev (CVS snapshot a few days old) > > Tobias > > At 07:29 PM 10/22/99 +0200, Stas Bekman wrote: > >Hi, > > > >Let's take a little script that obviously "hangs" the server: > > > > my $r = shift; > > $r->send_http_header('text/plain'); > > $|=1; # so we would see the $$ printed > > print "OK $$\n"; > > sleep 1, $i++ while 1; > > > >The second question is how comes that the above little script never quits > >after the stop button was pressed? Apache was supposed to detect SIGPIPE > >and abort the run... but it doesn't - it's very easy to reproduce - just > >run it... I've used $|=1 to print the $$ and check that it really hangs... > > > >Thanks! > > > >___ > >Stas Bekman mailto:[EMAIL PROTECTED]www.singlesheaven.com/stas > >Perl,CGI,Apache,Linux,Web,Java,PC at www.singlesheaven.com/stas/TULARC > >www.apache.org & www.perl.com == www.modperl.com || perl.apache.org > >single o-> + single o-+ = singlesheavenhttp://www.singlesheaven.com > > > > > > > ___ Stas Bekman mailto:[EMAIL PROTECTED]www.singlesheaven.com/stas Perl,CGI,Apache,Linux,Web,Java,PC at www.singlesheaven.com/stas/TULARC www.apache.org & www.perl.com == www.modperl.com || perl.apache.org single o-> + single o-+ = singlesheavenhttp://www.singlesheaven.com
Re: Hanging process: detection and determination (was Re: Runaway processes)
Stas, I use Apache::Resource to set a CPU limit, that only a runaway process would hit so the random killer process doesn't accumulate and take down my system. I have MaxRequestsPerChild set to a few hundred and have found empirically that they don't tend to take more than 10 seconds of CPU time for normal use, so I give a CPU limit of 20-30 seconds for all my httpds. I also run a monitor program that watchdogs the server every 20-30 seconds and restarts it if response time is ever too low, just in case other odd things go wrong. It just does a graceful restart, I haven't needed to fix a problem with a full stop / start yet. -- Joshua _ Joshua Chamas Chamas Enterprises Inc. NODEWORKS >> free web link monitoring Huntington Beach, CA USA http://www.nodeworks.com1-714-625-4051 Stas Bekman wrote: > > Hi, > > Recently there were a few questions regarding hanging processes, I've > tried to reproduce this case and have found two problems. > > Let's take a little script that obviously "hangs" the server: > > my $r = shift; > $r->send_http_header('text/plain'); > $|=1; # so we would see the $$ printed > print "OK $$\n"; > sleep 1, $i++ while 1; > > First question is: how do I detect that some server hangs? I've tried top, > ps, /server-status -- none of them helped me to find that some process > hangs. Of course if the process uses lot of resources you can bust it, by > watching the top(), another approach is to use /server-status and watch it > for about 5-10 minutes spotting which process number has the same number > of requests while its status is 'W' (Which means that it hangs), but when > you have about 50 procs, it's quite hard to spot such a process. > > Another easy spotting is when some process trashes the error_log and > writes millions of error messages there... But you still don't know the > PID of this process, so you just restart all of them. > > So my question is, is there any way to tell that some process hangs? > Those who reported their processes hang, how did you spot it? > > If I knew a programmatical way to spot the hanging process, I'd implement > it in Apache::VMonitor to warn the admin and it would be possible to run > watchdogs to kill off the process and report to admin... I think it would > be a useful addon for us... > > The second question is how comes that the above little script never quits > after the stop button was pressed? Apache was supposed to detect SIGPIPE > and abort the run... but it doesn't - it's very easy to reproduce - just > run it... I've used $|=1 to print the $$ and check that it really hangs... > > Thanks! > > ___ > Stas Bekman mailto:[EMAIL PROTECTED]www.singlesheaven.com/stas > Perl,CGI,Apache,Linux,Web,Java,PC at www.singlesheaven.com/stas/TULARC > www.apache.org & www.perl.com == www.modperl.com || perl.apache.org > single o-> + single o-+ = singlesheavenhttp://www.singlesheaven.com
Re: Hanging process: detection and determination (was Re: Runaway processes)
The reason why the stop button does not stop the script lies in the fact that you're script does not produce any output while it is running. SIGPIPE is only raised when your script tries to write to a closed (STOPed) connection. No output from your script = no SIGPIPE! Tobias At 07:29 PM 10/22/99 +0200, Stas Bekman wrote: >Hi, > >Let's take a little script that obviously "hangs" the server: > > my $r = shift; > $r->send_http_header('text/plain'); > $|=1; # so we would see the $$ printed > print "OK $$\n"; > sleep 1, $i++ while 1; > >The second question is how comes that the above little script never quits >after the stop button was pressed? Apache was supposed to detect SIGPIPE >and abort the run... but it doesn't - it's very easy to reproduce - just >run it... I've used $|=1 to print the $$ and check that it really hangs... > >Thanks! > >___ >Stas Bekman mailto:[EMAIL PROTECTED]www.singlesheaven.com/stas >Perl,CGI,Apache,Linux,Web,Java,PC at www.singlesheaven.com/stas/TULARC >www.apache.org & www.perl.com == www.modperl.com || perl.apache.org >single o-> + single o-+ = singlesheavenhttp://www.singlesheaven.com > >
Hanging process: detection and determination (was Re: Runaway processes)
Hi, Recently there were a few questions regarding hanging processes, I've tried to reproduce this case and have found two problems. Let's take a little script that obviously "hangs" the server: my $r = shift; $r->send_http_header('text/plain'); $|=1; # so we would see the $$ printed print "OK $$\n"; sleep 1, $i++ while 1; First question is: how do I detect that some server hangs? I've tried top, ps, /server-status -- none of them helped me to find that some process hangs. Of course if the process uses lot of resources you can bust it, by watching the top(), another approach is to use /server-status and watch it for about 5-10 minutes spotting which process number has the same number of requests while its status is 'W' (Which means that it hangs), but when you have about 50 procs, it's quite hard to spot such a process. Another easy spotting is when some process trashes the error_log and writes millions of error messages there... But you still don't know the PID of this process, so you just restart all of them. So my question is, is there any way to tell that some process hangs? Those who reported their processes hang, how did you spot it? If I knew a programmatical way to spot the hanging process, I'd implement it in Apache::VMonitor to warn the admin and it would be possible to run watchdogs to kill off the process and report to admin... I think it would be a useful addon for us... The second question is how comes that the above little script never quits after the stop button was pressed? Apache was supposed to detect SIGPIPE and abort the run... but it doesn't - it's very easy to reproduce - just run it... I've used $|=1 to print the $$ and check that it really hangs... Thanks! ___ Stas Bekman mailto:[EMAIL PROTECTED]www.singlesheaven.com/stas Perl,CGI,Apache,Linux,Web,Java,PC at www.singlesheaven.com/stas/TULARC www.apache.org & www.perl.com == www.modperl.com || perl.apache.org single o-> + single o-+ = singlesheavenhttp://www.singlesheaven.com