Hello! On Wed, Jul 03, 2013 at 04:48:29PM +0200, Florian S. wrote:
> Hi together! > > I'm having occasionally trouble with worker processes left <defunct> > and nginx stopping handling signals (HUP and even TERM) in general. > > Upon reconfigure signal, the log shows four new processes being > spawned, while the old four processes are shutting down: > > > [notice] 5159#0: using the "epoll" event method > > [notice] 5159#0: nginx/1.4.1 > > [notice] 5159#0: built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1) > > [notice] 5159#0: OS: Linux 3.9.7-147-x86 > > [notice] 5159#0: getrlimit(RLIMIT_NOFILE): 100000:100000 > > [notice] 5159#0: start worker processes > > [notice] 5159#0: start worker process 5330 > > [notice] 5159#0: start worker process 5331 > > [notice] 5159#0: start worker process 5332 > > [notice] 5159#0: start worker process 5333 > > [notice] 5159#0: signal 1 (SIGHUP) received, reconfiguring > > [notice] 5159#0: reconfiguring > > [notice] 5159#0: using the "epoll" event method > > [notice] 5159#0: start worker processes > > [notice] 5159#0: start worker process 12457 > > [notice] 5159#0: start worker process 12458 > > [notice] 5159#0: start worker process 12459 > > [notice] 5159#0: start worker process 12460 > > [notice] 5159#0: start cache manager process 12461 > > [notice] 5159#0: start cache loader process 12462 > > [notice] 5331#0: gracefully shutting down > > [notice] 5330#0: gracefully shutting down > > [notice] 5331#0: exiting > > [notice] 5330#0: exiting > > [notice] 5331#0: exit > > [notice] 5330#0: exit > > [notice] 5332#0: gracefully shutting down > > [notice] 5159#0: signal 17 (SIGCHLD) received > > [notice] 5159#0: worker process 5331 exited with code 0 > > [notice] 5332#0: exiting > > [notice] 5332#0: exit > > [notice] 5333#0: gracefully shutting down > > [notice] 5333#0: exiting > > [notice] 5333#0: exit > > After that, nginx is fully operational and serving requests -- > however, ps yields: > > > root 5159 0.0 0.0 6248 1696 ? Ss 10:43 0:00 nginx: master > process /chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf > > nobody 5330 0.0 0.0 0 0 ? Z 10:43 0:00 [nginx] <defunct> > > nobody 5332 0.0 0.0 0 0 ? Z 10:43 0:00 [nginx] <defunct> > > nobody 5333 0.0 0.0 0 0 ? Z 10:43 0:00 [nginx] <defunct> > > nobody 12457 0.0 0.0 8332 2940 ? S 10:44 0:00 nginx: worker process > > nobody 12458 0.0 0.0 8332 2940 ? S 10:44 0:00 nginx: worker process > > nobody 12459 0.0 0.0 8332 3544 ? S 10:44 0:00 nginx: worker process > > nobody 12460 0.0 0.0 8332 2940 ? S 10:44 0:00 nginx: worker process > > nobody 12461 0.0 0.0 6296 1068 ? S 10:44 0:00 nginx: cache > manager process > > nobody 12462 0.0 0.0 0 0 ? Z 10:44 0:00 [nginx] <defunct> > > In the log one can see that SIGCHLD is only received once for 5331, > which does not show up as zombie -- in contrast to the workers 5330, > 5332, 5333, and the cache loader 12462. > Much more serious is that neither > > > /chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf -s(stop|reload) > > nor > > > kill 5159 > > seem to get handled by nginx anymore (nothing in the log and no > effect). Maybe the master process is stuck waiting for some mutex?: > > >strace -p 5159 > > Process 5159 attached - interrupt to quit > > futex(0xb7658e6c, FUTEX_WAIT_PRIVATE, 2, NULL > > Unfortunately, I missed to get a core dump of the master process > while it was running. Additionally, there is no debug log available, > sorry. As I was not able to reliably reproduce this issue, I'll most > probably have to wait... It indeed looks like the master process is blocked somewhere. It would be interesting to see stack trace of a master process when this happens. (It's also good idea to make sure there are no 3rd party modules/patches, just in case.) -- Maxim Dounin http://nginx.org/en/donation.html _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx-devel