Whoops, should have meant 'ps auxjwww’. The ‘j’ enables the parent process relationships.
Also, what LogLevel are you running in Apache? Are you using ‘info’ so can see messages from mod_wsgi about process shutdown? Graham > On 18 Nov 2016, at 11:13 PM, [email protected] wrote: > > Thanks Graham. > > They look pretty normal: > > USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND > root 2673 0.0 0.0 61272 3276 ? Ss 04:09 0:01 > /usr/sbin/httpd.worker > apache 1201 0.1 0.0 739448 8668 ? Sl 06:32 0:01 > /usr/sbin/httpd.worker > svcuser 12840 0.0 0.0 455436 22876 ? Sl 05:19 0:03 > daemon-display-name > svcuser 23339 0.0 0.0 237320 5392 ? Sl Nov17 0:00 > daemon-display-name <-- orphan > > Note that we do *not* see the pids of our daemon workers in the apache log > when it shuts down. We only see the pids of non-modwsgi workers, for > handling server-status et al. So in above output we would see only pid 1201 > shutdown problems in httpd log. > > This issue has been around for a while, we have observed it here and there in > the past, but recently it has amplified and is causing resource exhaustion > and we're trying to answer 'why now' in addition to 'why'? > > > Appreciate the help. > > > On Thursday, November 17, 2016 at 11:20:12 PM UTC-5, Graham Dumpleton wrote: > > > On 18 Nov 2016, at 2:39 PM, robert...@ <>dealertrack.com > > <http://dealertrack.com/> wrote: > > > > Hello, > > > > We are having an issue using Apache/2.2.15 (Unix) mod_wsgi/3.3 Python/2.7.3 > > worker MPM/daemon mode, where apache restarts cause daemon processes to > > become orphaned (adopt ppid 1 and continue to run app code but not take > > http requests). > > > > Each time the error occurs, we will see something like: > > [Thu Nov 17 22:15:00 2016] [warn] child process 23371 still did not exit, > > sending a SIGTERM > > [Thu Nov 17 22:15:02 2016] [warn] child process 23371 still did not exit, > > sending a SIGTERM > > [Thu Nov 17 22:15:04 2016] [warn] child process 23371 still did not exit, > > sending a SIGTERM > > [Thu Nov 17 22:15:06 2016] [error] child process 23371 still did not exit, > > sending a SIGKILL > > > > .. where pid 23371 was an httpd worker. > > > > This causes me to assume that the root worker (initial process spawned by > > httpd and owned by root) sends (TERM, TERM, TERM, KILL) to the worker(s), > > which then attempts to kill the daemon processes but can't for some reason > > and that causes it to not respond to it's parent's requests to die. > > However, this does not make sense to me because that worker is run by > > low-privilege apache user which does not have ability to kill our daemon > > processes (which have a different uid/gid). We have tried permutations of > > different users and privileges and nothing helps. > > > > We can easily send a TERM to any of the daemon processes manually (orphaned > > or not), and they die cleanly in well under the 3 second window that apache > > uses. They die, and mod_wsgi emits something to the httpd log saying they > > were aborted. It just doesn't happen when httpd tries to do it. > > > > We are using C modules, and we have enabled WSGIApplicationGroup ${GLOBAL} > > and as far as we can tell our permissions and vhost configuration is right. > > The application works well at runtime. > > > > In order to continue to debug this, we were hoping to find out exactly how > > the daemons are signaled that they should exit. Tracing the daemon > > processes with sysdig shows nothing about them getting any signals from > > httpd to terminate. > > > > Any ideas or tips on how to put the pieces together? > > The signals to shutdown should be sent by the Apache root process, which runs > as root. There is no way the daemon processes should be able to ignore the > SIGKILL. The only way the processes should be able to hang around is if they > became zombie processes because they were hung on some resource such as an > NFS mount. They will not actually be running in this case, only occupying a > slot in the process table and nothing more. > > Really need to see the output of ‘ps auxwww’ so can see the pids, > relationship to other httpd processes and the process state and whether it is > a zombie (Z). > > Overall not much can do to help as you are on an ancient Apache/mod_wsgi > version. From memory have seen some complaints of something similar before, > but they all revolved around the user of Apache 2.2.12-2.2.16. Never seen > anything similar since. So have always suspected some strange issue with > Apache around that version. > > Graham > > > -- > You received this message because you are subscribed to the Google Groups > "modwsgi" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > To post to this group, send email to [email protected] > <mailto:[email protected]>. > Visit this group at https://groups.google.com/group/modwsgi > <https://groups.google.com/group/modwsgi>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "modwsgi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/modwsgi. For more options, visit https://groups.google.com/d/optout.
