Steve Reppucci wrote:
This is *exactly* the symptoms we see, and we're just about always up to
date with Apache/Perl/modperl releases.
We've spent a fair amount of time trying to isolate the cause of these,
but haven't been able to point the finger at any one cause. Some of the
things we've determined:
- The same behavior is displayed under Solaris (5.6 and 5.7) and Linux
(2.2.14).
- We've seen this through through a bunch of releases of
Apache/Perl/modperl over the past 6 months.
- When a child process goes astray, it is in a tight loop, quickly growing
to consume 95 to 100% of the cpu cycles.
- Under Linux, running strace on the runaway results in nothing --
no system calls are shown whatsoever, so it's apparently spinning in
a tight CPU loop (though see the next bullet -- it's possible I've
just never caught it at an early enough stage.)
- Under Solaris, I've managed to catch a few of these at an early stage
and observed (via truss) an endless series of 'sbrk' calls, eventually
this gets bound up tight with no system calls displayed, like the
Linux case.
- This seems to happen more often under heavy load, but we've also seen it
fairly regularly during low traffic periods.
- We did have some luck in doing a thorough read of our handlers that use
DBI, making sure that all database connections are explicitly closed
at the end of a request (we *don't* use Apache::DBI). This cut down on
the number of runaways, but we still see them.
We've kept our runaways under control by running a watchdog script that
looks for modperl processes with the correct load numbers (cpu% 10% and
run time something), but we've all along thought that this would be a
temporary solution until we determined what we're doing wrong.
Yup , I've do it before , but sometimes runaways are still there and quick take
down the system before you kill them.
Now that I've seen this report from a couple of others on the list, I'm
wondering if it's not something we're doing, but rather something within
Apache or modperl.
If there's anything anyone on the list can recommend that I do to try to
collect more clues on the cause, I'll be happy to try it.
Or maybe if there are others who've seen the same behavior, pipe in so
that we can get a feeling for how many sites are experiencing this?
Steve Reppucci
Just wonders the imdb's apache-modperl version :
Server: Apache/1.3.11-dev (Unix) mod_perl/1.21_01-dev .
Maybe this version is most stable to them, they must have a load balance for
failover also.