Hi together!

I'm having occasionally trouble with worker processes left <defunct> and nginx stopping handling signals (HUP and even TERM) in general.

Upon reconfigure signal, the log shows four new processes being spawned, while the old four processes are shutting down:

> [notice] 5159#0: using the "epoll" event method
> [notice] 5159#0: nginx/1.4.1
> [notice] 5159#0: built by gcc 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)
> [notice] 5159#0: OS: Linux 3.9.7-147-x86
> [notice] 5159#0: getrlimit(RLIMIT_NOFILE): 100000:100000
> [notice] 5159#0: start worker processes
> [notice] 5159#0: start worker process 5330
> [notice] 5159#0: start worker process 5331
> [notice] 5159#0: start worker process 5332
> [notice] 5159#0: start worker process 5333
> [notice] 5159#0: signal 1 (SIGHUP) received, reconfiguring
> [notice] 5159#0: reconfiguring
> [notice] 5159#0: using the "epoll" event method
> [notice] 5159#0: start worker processes
> [notice] 5159#0: start worker process 12457
> [notice] 5159#0: start worker process 12458
> [notice] 5159#0: start worker process 12459
> [notice] 5159#0: start worker process 12460
> [notice] 5159#0: start cache manager process 12461
> [notice] 5159#0: start cache loader process 12462
> [notice] 5331#0: gracefully shutting down
> [notice] 5330#0: gracefully shutting down
> [notice] 5331#0: exiting
> [notice] 5330#0: exiting
> [notice] 5331#0: exit
> [notice] 5330#0: exit
> [notice] 5332#0: gracefully shutting down
> [notice] 5159#0: signal 17 (SIGCHLD) received
> [notice] 5159#0: worker process 5331 exited with code 0
> [notice] 5332#0: exiting
> [notice] 5332#0: exit
> [notice] 5333#0: gracefully shutting down
> [notice] 5333#0: exiting
> [notice] 5333#0: exit

After that, nginx is fully operational and serving requests -- however, ps yields:

> root 5159 0.0 0.0 6248 1696 ? Ss 10:43 0:00 nginx: master process /chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf
> nobody  5330 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
> nobody  5332 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
> nobody  5333 0.0 0.0    0    0 ?     Z  10:43 0:00 [nginx] <defunct>
> nobody 12457 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
> nobody 12458 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
> nobody 12459 0.0 0.0 8332 3544 ?     S  10:44 0:00 nginx: worker process
> nobody 12460 0.0 0.0 8332 2940 ?     S  10:44 0:00 nginx: worker process
> nobody 12461 0.0 0.0 6296 1068 ? S 10:44 0:00 nginx: cache manager process
> nobody 12462 0.0 0.0    0    0 ?     Z  10:44 0:00 [nginx] <defunct>

In the log one can see that SIGCHLD is only received once for 5331, which does not show up as zombie -- in contrast to the workers 5330, 5332, 5333, and the cache loader 12462.
Much more serious is that neither

> /chroots/nginx/nginx -c /chroots/nginx/conf/nginx.conf -s(stop|reload)

nor

> kill 5159

seem to get handled by nginx anymore (nothing in the log and no effect). Maybe the master process is stuck waiting for some mutex?:

strace -p 5159
> Process 5159 attached - interrupt to quit
 futex(0xb7658e6c, FUTEX_WAIT_PRIVATE, 2, NULL

Unfortunately, I missed to get a core dump of the master process while it was running. Additionally, there is no debug log available, sorry. As I was not able to reliably reproduce this issue, I'll most probably have to wait...

Many thanks in advance and kind regards,
Florian

_______________________________________________
nginx-devel mailing list
nginx-devel@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx-devel

Reply via email to