Hi Willy et al., > Thank you for this report, it helps. How often does it happen, and/or after > how long on average after you start it ? What's your workload ? Do you use > SSL, compression, TCP and/or HTTP mode, peers synchronization, etc ?
Yesterday, we upgraded from 1.5.14 to 1.5.18 and now observed exactly this issue in production. After rolling back to 1.5.14, it didn't occur anymore. We have mostly http traffic, little TCP with about 100-200 req/s, about 2000 concurrent connections over all. About all traffic is SSL terminated. We use no peer synchronization and no compression. An strace on the process reveals this (with most of the calls being epoll_wait): [...] epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {{EPOLLIN, {u32=796, u64=796}}}, 200, 0) = 1 read(796, " \357\275Y\231\275'b\5\216#\33\220\337'\370\312\215sG4\316\275\277y-%\v\v\211\331\342"..., 5872) = 1452 read(796, 0x9fa26ec, 4420) = -1 EAGAIN (Resource temporarily unavailable) epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 [...] The strace was done after reloading using -sf. However, the process was at 100% load even before the reload. Since we kept the process running after the reload (it still holds some connections), I was able to run a second strace, about half an hour later which now show a different behavior: [...] epoll_wait(0, {}, 200, 4) = 0 epoll_wait(0, {}, 200, 7) = 0 epoll_wait(0, {}, 200, 3) = 0 epoll_wait(0, {}, 200, 6) = 0 epoll_wait(0, {}, 200, 3) = 0 epoll_wait(0, {}, 200, 3) = 0 epoll_wait(0, {}, 200, 10) = 0 epoll_wait(0, {}, 200, 3) = 0 epoll_wait(0, {}, 200, 27) = 0 epoll_wait(0, {}, 200, 6) = 0 epoll_wait(0, {}, 200, 4) = 0 [...] The CPU load taken by the process is now back to more or less idle load, without further intervention on the process. `haproxy -vv` of the process running into the busy-loop shows HA-Proxy version 1.5.18 2016/05/10 Copyright 2000-2016 Willy Tarreau <wi...@haproxy.org> Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing OPTIONS = USE_GETADDRINFO=1 USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.8 Compression algorithms supported : identity, deflate, gzip Built with OpenSSL version : OpenSSL 1.0.1t 3 May 2016 Running on OpenSSL version : OpenSSL 1.0.1t 3 May 2016 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 8.35 2014-04-04 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Unfortunately, since we have rolled back production to 1.5.14, we have now little possibility to reproduce this anymore. The process which shows the behavior is still running for the time being though. Regards, Holger