We have quite a high volume site, we have 4 front end nginx servers, each:
*
AMD EPYC 7402P 24-Core Processor
*
INTEL SSDPELKX020T8 ( 2TB NVMe )
*
Dual  Broadcom BCM57416 NetXtreme-E 10GBase-T
*
512GB of RAM
We have a fairly complex nginx config with sharded caches as explained in 
https://www.nginx.com/blog/shared-caches-nginx-plus-cache-clusters-part-1/

We see this problem on :

nginx version: nginx/1.19.6
built by gcc 8.3.0 (Debian 8.3.0-6)
built with OpenSSL 1.1.1d  10 Sep 2019
TLS SNI support enabled
configure arguments: --add-module=/root/incubator-pagespeed-ngx-latest-stable 
--with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module 
--with-http_mp4_module --with-http_ssl_module --with-http_stub_status_module 
--with-pcre-jit --with-http_secure_link_module --with-http_v2_module 
--with-http_realip_module --with-stream_geoip_module --http-scgi-temp-path=/tmp 
--http-uwsgi-temp-path=/tmp --http-fastcgi-temp-path=/tmp 
--http-proxy-temp-path=/tmp --http-log-path=/var/log/nginx/access 
--error-log-path=/var/log/nginx/error --pid-path=/var/run/nginx.pid 
--conf-path=/etc/nginx/nginx.conf --sbin-path=/usr/sbin --prefix=/usr 
--with-threads

Pagespeed is our only third party module and it is version 1.13.35.2-0

Some nginx process start to spin in a tight loop, strace shows:

write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN 
(Resource temporarily unavailable)
write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN 
(Resource temporarily unavailable)
write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN 
(Resource temporarily unavailable)
write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN 
(Resource temporarily unavailable)
write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN 
(Resource temporarily unavailable)
write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN 
(Resource temporarily unavailable)

looking in /proc 

root@ao3-front08:/proc/799697/fd# ls -l 168
l-wx------ 1 nginx nginx 64 Jan 18 22:05 168 -> 'pipe:[2914414548]'

root@ao3-front08:/proc# grep 2914414548 /tmp/fds
lr-x------ 1 nginx nginx 64 Jan 18 22:05 799697/fd/167 -> pipe:[2914414548]
l-wx------ 1 nginx nginx 64 Jan 18 22:05 799697/fd/168 -> pipe:[2914414548]

The issue happens more when load is higher. Has anyone some advice as my 
current hack of killing processes that have used more than 1800 seconds of cpu 
is wrong.



_______________________________________________
nginx mailing list
nginx@nginx.org
http://mailman.nginx.org/mailman/listinfo/nginx

Reply via email to