We have quite a high volume site, we have 4 front end nginx servers, each: * AMD EPYC 7402P 24-Core Processor * INTEL SSDPELKX020T8 ( 2TB NVMe ) * Dual Broadcom BCM57416 NetXtreme-E 10GBase-T * 512GB of RAM We have a fairly complex nginx config with sharded caches as explained in https://www.nginx.com/blog/shared-caches-nginx-plus-cache-clusters-part-1/
We see this problem on : nginx version: nginx/1.19.6 built by gcc 8.3.0 (Debian 8.3.0-6) built with OpenSSL 1.1.1d 10 Sep 2019 TLS SNI support enabled configure arguments: --add-module=/root/incubator-pagespeed-ngx-latest-stable --with-http_flv_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_mp4_module --with-http_ssl_module --with-http_stub_status_module --with-pcre-jit --with-http_secure_link_module --with-http_v2_module --with-http_realip_module --with-stream_geoip_module --http-scgi-temp-path=/tmp --http-uwsgi-temp-path=/tmp --http-fastcgi-temp-path=/tmp --http-proxy-temp-path=/tmp --http-log-path=/var/log/nginx/access --error-log-path=/var/log/nginx/error --pid-path=/var/run/nginx.pid --conf-path=/etc/nginx/nginx.conf --sbin-path=/usr/sbin --prefix=/usr --with-threads Pagespeed is our only third party module and it is version 1.13.35.2-0 Some nginx process start to spin in a tight loop, strace shows: write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable) write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable) write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable) write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable) write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable) write(168, "H\0\0\0\0\0\0\0 W|\244\230U\0\0@y\20\244\230U\0\0", 24) = -1 EAGAIN (Resource temporarily unavailable) looking in /proc root@ao3-front08:/proc/799697/fd# ls -l 168 l-wx------ 1 nginx nginx 64 Jan 18 22:05 168 -> 'pipe:[2914414548]' root@ao3-front08:/proc# grep 2914414548 /tmp/fds lr-x------ 1 nginx nginx 64 Jan 18 22:05 799697/fd/167 -> pipe:[2914414548] l-wx------ 1 nginx nginx 64 Jan 18 22:05 799697/fd/168 -> pipe:[2914414548] The issue happens more when load is higher. Has anyone some advice as my current hack of killing processes that have used more than 1800 seconds of cpu is wrong.
_______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx