Hello,

I'm trying to solve a long-running problem whereby my Apache mod_perl
processes get stuck in a "FUTEX_WAIT" state instead of exiting.

I believe this is the same issue as reported here:
http://www.gossamer-threads.com/lists/modperl/modperl/99879

The problem occurs fairly frequently following a burst of traffic, when
Apache spawns new processes, then attempts to cull them afterward. It
also occurred, before I disabled this, when Apache tried to cull a
process upon reaching MaxRequestsPerChild.

Usually, from the child's point of view, this looks like this:

$ strace -p 21764
Process 21764 attached - interrupt to quit
read(5, "!", 1)                         = 1
tgkill(21764, 21791, SIGHUP)            = 0
tgkill(21764, 21791, SIG_0)             = 0
select(0, NULL, NULL, NULL, {0, 500000}) = ? ERESTARTNOHAND (To be
restarted)
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = -1 EINTR (Interrupted system call)
munmap(0x7f9905750000, 8392704)         = 0
munmap(0x7f98f8736000, 8392704)         = 0
...
madvise(0x7f98e4021000, 73728, MADV_DONTNEED) = 0
exit_group(0)                           = ?
Process 21764 detached

However, every five or so attempts, it instead goes like this:

$ strace -p 24133
Process 24133 attached - interrupt to quit
read(5, "!", 1)                         = 1
tgkill(24133, 24164, SIGHUP)            = 0
tgkill(24133, 24164, SIG_0)             = 0
--- SIGTERM (Terminated) @ 0 (0) ---
rt_sigreturn(0xf)                       = 0
select(0, NULL, NULL, NULL, {0, 500000}) = 0 (Timeout)
tgkill(24133, 24140, SIGUSR1)           = 0
futex(0x7f9904f4e9d0, FUTEX_WAIT, 24140, NULL

... and goes no further.

Sometimes, after a few minutes of doing nothing, the process will
suddenly free itself, spit out a bunch of "munmap" calls, and exit. But
more often it hangs indefinitely.

Given time, these hung children accumulate until they occupy all
available RAM, which sends the box into swap and eventually crashes it.

This problem has occurred on various flavors of Apache & Ubuntu over the
last two years. I'm currently seeing it regularly on the two boxes I
manage, which are:

- Apache/2.2.17 (Ubuntu) mod_perl/2.0.4 Perl/v5.10.1 on Ubuntu 11.04
(2.6.38-11-generic #50-Ubuntu SMP x86_64).

- Apache/2.2.14 (Ubuntu) mod_perl/2.0.4 Perl/v5.10.1 on Ubuntu 10.04
(2.6.32-30-server #59-Ubuntu SMP x86_64).

The problem does not occur on Apache running without mod_perl.

I have tried to debug this problem for a long time, but don't know how
to advance any further.

Thanks in advance for any advice!

Max.

Reply via email to