on 11/09/2013 04:26 Pqf 潘庆峰 said the following:
> I think your problem is find out how the zombies stay. Actually I can't tell 
> base on the information you gave, but I think you can find out with these:
> 1. find out the pid of PM
> 2. use strace -p $PM_pid (linux) or truss -p $PM_pid(Solaris), it will tell 
> you what PM doing, is the waitpid() called? is waitpid() return error? or the 
> PM just die itself for some reasons? ...and other useful information.

Sorry that I was not clear about this in my original post.
The PM is doing well: it's running and it's calling waitpid on other processes.
It does not call waitpid on the zombie processes in question because they are
still on the busy list.  And it seems that the PM never checks processes on the
busy list.

I've been thinking about this problem and the only theory that I have got so far
is that perhaps an owner httpd process could terminate ungracefully (e.g.
crash).  In that case the pool cleanup would never be run.  That's OK for
process local resources like memory or file descriptors, which would be freed by
OS because the process dies anyway.  But that's not OK for external resources
like other processes.
In other words, if an httpd process marks an fcgid process as busy and then
suddenly dies, then there is nobody to move the fcgid process back to the idle 
list.

-- 
Andriy Gapon

Reply via email to