Re: Health check logging differences between 1.9 and 2.0
On 2020-10-24 10:32, Willy Tarreau wrote: That sounds strange, I don't like this. This sounds like an uninitialized variable. Did you observe that the facility used is stable inside a backend for example, or does it seem to be affected by other activity ? After investigation, it appears that in case of master-worker model (-Ws) and systemd Type=notify, master process is duplicating some worker messages, prepends severity string, something that appears to be some counter (if form of 302/104225), worker pid and emits those with syslog facility 'daemon'. log /dev/log local0 is configured in global section, no other 'log' statement in haproxy config. haproxy-daemon.log: 2020-10-29T10:36:41.503040+00:00 hostname.tld haproxy[17274]: [WARNING] 302/103641 (19612) : Health check for server proxy_upstream_ssl/ovh_sbg succeeded, reason: Layer7 check passed, code: 200, info: "OK", check duration: 85ms, status: 3/3 UP. 2020-10-29T10:42:25.488975+00:00 hostname.tld haproxy[17274]: [WARNING] 302/104225 (19612) : Stopping proxy proxy_upstream_ssl in 0 ms. 2020-10-29T10:42:25.490817+00:00 hostname.tld haproxy[17274]: [WARNING] 302/104225 (19612) : Proxy proxy_upstream_ssl stopped (FE: 14 conns, BE: 14 conns). haproxy-local0.log: 2020-10-29T10:36:41.502693+00:00 hostname.tld haproxy[19612]: Health check for server proxy_upstream_ssl/ovh_sbg succeeded, reason: Layer7 check passed, code: 200, info: "OK", check duration: 85ms, status: 3/3 UP. 2020-10-29T10:42:25.485245+00:00 hostname.tld haproxy[17274]: Proxy proxy_upstream_ssl started. 2020-10-29T10:42:25.492269+00:00 hostname.tld haproxy[19612]: Stopping proxy proxy_upstream_ssl in 0 ms. 2020-10-29T10:42:25.494013+00:00 hostname.tld haproxy[19612]: Proxy proxy_upstream_ssl stopped (FE: 14 conns, BE: 14 conns). I'm pretty sure that duplication of log messages should not happen. Or is it indeed intended? Best regards, Veiko
Re: Health check logging differences between 1.9 and 2.0
On 2020-10-28 13:11, Veiko Kukk wrote: Another difference between 1.9 and 2.0 here is that 2.0 is compiled with systemd support and executed using -Ws and Type=notify instead of 1.9 -W and Type=forking. With the exactlty same HAproxy 2.0 (compiled with systemd support), I've changed haproxy.service to have identical configuration to 1.9 one: ExecStart=/usr/sbin/haproxy -W -f $CONFIG -p $PIDFILE Type=forking Log messages are not duplicated anymore, there are no more priority 30 messages! Veiko
Re: Health check logging differences between 1.9 and 2.0
On 2020-10-26 13:57, Christopher Faulet wrote: health-check log messages are emitted in the same way in 1.9 and 2.0. And at first glance, the code responsible to set the syslog priority is the same too. So there is probably something we missed. Could you confirm you still have the same issue with the above configuration and a netcat as syslog server ? If it works as expected, please share your configuration, not only the global and defaults sections. I cannot reproduce the issue with netcat, neither with simple configuration provided by you or our more complex test server config. I've reconfigured rsyslog to log raw messages into different files suffixed with syslog facility name: $ grep "proxy_upstream_ssl" haproxy-* haproxy-daemon.log:<30>Oct 28 12:57:19 haproxy[13420]: [WARNING] 301/125719 (13424) : Health check for server proxy_upstream_ssl/ovh_sbg succeeded, reason: Layer7 check passed, code: 200, info: "OK", check duration: 30ms, status: 3/3 UP. haproxy-daemon.log:<30>Oct 28 12:57:20 haproxy[13420]: [WARNING] 301/125720 (13424) : Health check for server proxy_upstream_ssl/ovh_bhs succeeded, reason: Layer7 check passed, code: 200, info: "OK", check duration: 423ms, status: 3/3 UP. haproxy-local0.log:<133>Oct 28 12:57:19 haproxy[13420]: Proxy proxy_upstream_ssl started. haproxy-local0.log:<133>Oct 28 12:57:19 haproxy[13424]: Health check for server proxy_upstream_ssl/ovh_sbg succeeded, reason: Layer7 check passed, code: 200, info: "OK", check duration: 30ms, status: 3/3 UP. haproxy-local0.log:<133>Oct 28 12:57:20 haproxy[13424]: Health check for server proxy_upstream_ssl/ovh_bhs succeeded, reason: Layer7 check passed, code: 200, info: "OK", check duration: 423ms, status: 3/3 UP. Meaning my initial assumption about all health check logs being emitted differently was wrong. Strange that log line formats are also different. haproxy-daemon.log is emitted by master process running as user root (13420) and it duplicates health check log messages from it's subprocess running as user haproxy (13424), prepending severity name and subprocess pid. Between those is '301/125719' - I don't know what it is. Another difference between 1.9 and 2.0 here is that 2.0 is compiled with systemd support and executed using -Ws and Type=notify instead of 1.9 -W and Type=forking. global log /dev/log local0 daemon nbproc 1 nbthread 2 I wonder if systemd-journald is duplicating messages here. Best regards, Veiko
Re: Health check logging differences between 1.9 and 2.0
On 2020-10-22 10:38, Veiko Kukk wrote: Indeed, in HAproxy 2.0, 'option log-health-checks' messages are emitted usig syslog facility 'daemon' and not the facility configured with global configuration keyword 'log'. In 1.9, health check logs were emitted as defined by 'log' facility value. I was too early to conclude that health check logging is emitted as 'daemon'. Sometimes they are also emitted as 'user'. Veiko
Re: Health check logging differences between 1.9 and 2.0
On 2020-10-20 11:56, Veiko Kukk wrote: I've upgraded some servers from 1.9.15 to 2.0.18. Log config is very simple. ... Without any changes to rsyslog configuration/filters, health checks are now filtered to /var/log/messages and not into specified haproxy log files as was before. Answering to my own question. Indeed, in HAproxy 2.0, 'option log-health-checks' messages are emitted usig syslog facility 'daemon' and not the facility configured with global configuration keyword 'log'. In 1.9, health check logs were emitted as defined by 'log' facility value. If this is intended, I suggest adding this information to documentation for 'option log-health-checks'. http://cbonte.github.io/haproxy-dconv/2.0/configuration.html#option%20log-health-checks Veiko
Health check logging differences between 1.9 and 2.0
Hi I've upgraded some servers from 1.9.15 to 2.0.18. Log config is very simple. global log /dev/log local0 defaults log global option httplog option log-health-checks Without any changes to rsyslog configuration/filters, health checks are now filtered to /var/log/messages and not into specified haproxy log files as was before. Why? Did something change between 1.9 and 2.0 regarding to 'option log-health-checks'? Best regards, Veiko
Re: 2.0.14 PCRE2 JIT compilation failed
On 2020-04-24 12:47, Veiko Kukk wrote: HAproxy 2.0.14 on CentOS 7.7.1908 with PCRE2 JIT enabled (USE_PCRE2=1 USE_PCRE2_JIT=1). When starting it with configuration that has following ACL regex line, it fails: acl path_is_foo path_reg ^\/video\/[a-zA-Z0-9_-]{43}\/[a-z0-9]{8}\/videos\/ Error message: error detected while parsing ACL 'path_is_foo' : regex '^\/video\/[a-zA-Z0-9_-]{43}\/[a-z0-9]{8}\/videos\/' jit compilation failed. Hi again, It has happened to many of us that after asking for help, a good idea to test/debug comes. It turned out to be selinx issue. #= haproxy_t == # This avc can be allowed using the boolean 'cluster_use_execmem' allow haproxy_t self:process execmem; I wonder if somewhere in HAproxy documentation about pcre jit, it is mentioned that in case of selinux, selinux rules must be changed for the jit to work. If not, would be nice to add it. -- Best regards, Veiko
2.0.14 PCRE2 JIT compilation failed
Hi Since 1.9 support ends soon, I'm trying to start using 2.0 series. HAproxy 2.0.14 on CentOS 7.7.1908 with PCRE2 JIT enabled (USE_PCRE2=1 USE_PCRE2_JIT=1). When starting it with configuration that has following ACL regex line, it fails: acl path_is_foo path_reg ^\/video\/[a-zA-Z0-9_-]{43}\/[a-z0-9]{8}\/videos\/ Error message: error detected while parsing ACL 'path_is_foo' : regex '^\/video\/[a-zA-Z0-9_-]{43}\/[a-z0-9]{8}\/videos\/' jit compilation failed. Appearantly this regex has been working with PCRE (not PCRE2) and without jit for quite long time using 1.9 releases of HAproxy (I have not personally created nor tested this regex). When compiling HAproxy with PCRE2 but without JIT support, haproxy does not complain about this regular expression, no errors at all. I did not find much information of HAproxy path_reg regular expression syntax. Is it necessary to escape forward slashes? How to debug this issue, what is wrong with this expression? $ haproxy -vv HA-Proxy version 2.0.14 2020/04/02 - https://haproxy.org/ Build options : TARGET = linux-glibc CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_THREAD=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_SYSTEMD=1 Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS Default settings : bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with multi-threading support (MAX_THREADS=64, default=1). Built with OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017 Running on OpenSSL version : OpenSSL 1.0.2k-fips 26 Jan 2017 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 Built with Lua version : Lua 5.3.5 Built with network namespace support. Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with zlib version : 1.2.7 Running on zlib version : 1.2.7 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with PCRE2 version : 10.23 2017-02-14 PCRE2 library supports JIT : yes Encrypted password support via crypt(3): yes Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available multiplexer protocols : (protocols marked as cannot be specified using 'proto' keyword) h2 : mode=HTXside=FE|BE mux=H2 h2 : mode=HTTP side=FEmux=H2 : mode=HTXside=FE|BE mux=H1 : mode=TCP|HTTP side=FE|BE mux=PASS Available services : none Available filters : [SPOE] spoe [COMP] compression [CACHE] cache [TRACE] trace $ ldd /sbin/haproxy linux-vdso.so.1 => (0x7ffebcde1000) libcrypt.so.1 => /lib64/libcrypt.so.1 (0x7f7ac1989000) libz.so.1 => /lib64/libz.so.1 (0x7f7ac1773000) libdl.so.2 => /lib64/libdl.so.2 (0x7f7ac156f000) libpthread.so.0 => /lib64/libpthread.so.0 (0x7f7ac1353000) librt.so.1 => /lib64/librt.so.1 (0x7f7ac114b000) libssl.so.10 => /lib64/libssl.so.10 (0x7f7ac0ed9000) libcrypto.so.10 => /lib64/libcrypto.so.10 (0x7f7ac0a76000) libm.so.6 => /lib64/libm.so.6 (0x7f7ac0774000) libsystemd.so.0 => /lib64/libsystemd.so.0 (0x7f7ac0543000) libpcre2-8.so.0 => /lib64/libpcre2-8.so.0 (0x7f7ac02cc000) libpcre2-posix.so.1 => /lib64/libpcre2-posix.so.1 (0x7f7ac00c9000) libc.so.6 => /lib64/libc.so.6 (0x7f7abfcfb000) libfreebl3.so => /lib64/libfreebl3.so (0x7f7abfaf8000) /lib64/ld-linux-x86-64.so.2 (0x7f7ac1bc) libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x7f7abf8ab000) libkrb5.so.3 => /lib64/libkrb5.so.3 (0x7f7abf5c2000) libcom_err.so.2 => /lib64/libcom_err.so.2 (0x7f7abf3be000) libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x7f7abf18b000) libcap.so.2 => /lib64/libcap.so.2 (0x7f7abef86000) libselinux.so.1 => /lib64/libselinux.so.1 (0x7f7abed5f000) liblzma.so.5 => /lib64/liblzma.so.5 (0x7f7abeb39000) liblz4.so.1 => /lib64/liblz4.so.1
Understanding resolvers usage
Hi I'd like to have better understanding how server-template and resolvers work together. HAproxy 1.9.14. Relevant sections from config: resolvers dns accepted_payload_size 1232 parse-resolv-conf hold valid 90s resolve_retries 3 timeout resolve 1s timeout retry 1s server-template srv 4 _foo._tcp.server.name.tld ssl check resolvers dns resolve-prefer ipv4 resolve-opts prevent-dup-ip After some time, when I check statistics from socket: echo "show resolvers" |/usr/bin/socat /var/run/haproxy.sock.stats1 stdio Resolvers section dns nameserver 127.0.0.1: sent:33508 snd_error: 0 valid: 33502 update: 2 cname: 0 cname_error: 0 any_err: 0 nx: 0 timeout: 0 refused: 0 other: 0 invalid: 0 too_big: 0 truncated: 0 outdated:6 nameserver 8.8.8.8: sent:33508 snd_error: 0 valid: 0 update: 0 cname: 0 cname_error: 0 any_err: 0 nx: 0 timeout: 0 refused: 0 other: 0 invalid: 0 too_big: 0 truncated: 0 outdated:33508 nameserver 8.8.4.4: sent:33508 snd_error: 0 valid: 0 update: 0 cname: 0 cname_error: 0 any_err: 0 nx: 0 timeout: 0 refused: 0 other: 0 invalid: 0 too_big: 0 truncated: 0 outdated:33508 nameserver 64.6.64.6: sent:33508 snd_error: 0 valid: 6 update: 0 cname: 0 cname_error: 0 any_err: 0 nx: 0 timeout: 0 refused: 0 other: 0 invalid: 0 too_big: 0 truncated: 0 outdated:33502 What I wonder about here is why are all nameservers used instead of only the first one when there are no issues/errors with local caching server 127.0.0.1:53. From the statistics, the 'sent:' value leaves me impression that all DNS servers get all requests. I that true? /etc/resolv.conf itself: nameserver 127.0.0.1 nameserver 8.8.8.8 nameserver 8.8.4.4 nameserver 64.6.64.6 options timeout:1 attempts:2 I'd like to achieve situation where other nameservers would be used only when local caching server fails. Don't want to manually configure only local one in resolvers section (no failover) and would very much prefer not to duplicate name server config in resolv.conf and HAproxy config. -- Veiko
Re: 1.9 external health checks fail suddenly
On 2019-08-28 11:13, Veiko Kukk wrote: Applied it to 1.9.10, after ~ 12h it ran into spinlock using 400% cpu (4 threads configured). Not sure if this is related to patch or is some new bug in 1.9.10. I've now replaced running instance with 1.9.10 without external check patch to see if this happens again. Now, after almost one month, with 1.9.10 (no patches) it happened again. All external checks failed again and there was large amount of zombie external check processes accumulated. Unfortunately since I was not there doing reload, I can't tell timeframe or exact amount of those processes. regards, Veiko
Re: 1.9 external health checks fail suddenly
On 2019-07-11 08:35, Willy Tarreau wrote: against your version. Normally it should work for 1.9 to 2.1. Applied it to 1.9.10, after ~ 12h it ran into spinlock using 400% cpu (4 threads configured). Not sure if this is related to patch or is some new bug in 1.9.10. I've now replaced running instance with 1.9.10 without external check patch to see if this happens again. best regards, Veiko
Re: 1.9 external health checks fail suddenly
On 2019-07-09 13:59, Lukas Tribus wrote: How are you currently working around this issue? Did you disable external checks? I'd assume failing checks have negative impact on production systems also. Since this has happened so far only 3 times during 2 months, we've just reloaded HAproxy when it happens. Regards, Veiko
Re: 1.9 external health checks fail suddenly
On 2019-07-09 14:29, Willy Tarreau wrote: I didn't have a patch but just did it. It was only compile-tested, please verify that it works as expected on a non-sensitive machine first! Hi, Willy Against what version should I run this patch? Veiko
Re: 1.9 external health checks fail suddenly
On 2019-07-08 16:06, Lukas Tribus wrote: The bug you may be affected by is: https://github.com/haproxy/haproxy/issues/141 Can you check what happens with: nbthread 1 I'm afraid I can't because those are production systems that won't be able to service with single thread, they have relatively high ssl termination load. Veiko
Re: 1.9 external health checks fail suddenly
On 2019-07-01 10:11, Veiko Kukk wrote: Hi Sometimes (infrequently) all external checks hang and time out: * Has happened with versions 1.9.4 and 1.9.8 on multiple servers with nbproc 1 and nbthread set to (4-12) depending on server. * Happens infrequently, last one happened after 10 days of uptime. * External checks are written in python and write errors into their own log file directly. When hanging happens, nothing is logged by external check. * Only external checks fail, common 'option httpcheck' does not fail at the same time. Might be useful to add that reload helps to get over, external health checks start working again.
1.9 external health checks fail suddenly
Hi Sometimes (infrequently) all external checks hang and time out: * Has happened with versions 1.9.4 and 1.9.8 on multiple servers with nbproc 1 and nbthread set to (4-12) depending on server. * Happens infrequently, last one happened after 10 days of uptime. * External checks are written in python and write errors into their own log file directly. When hanging happens, nothing is logged by external check. * Only external checks fail, common 'option httpcheck' does not fail at the same time. HA-Proxy version 1.9.8 2019/05/13 - https://haproxy.org/ Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wtype-limits OPTIONS = USE_ZLIB=1 USE_THREAD=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : SSLv3 TLSv1.0 TLSv1.1 TLSv1.2 Built with Lua version : Lua 5.3.5 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with zlib version : 1.2.3 Running on zlib version : 1.2.7 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with PCRE version : 7.8 2008-09-05 Running on PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Encrypted password support via crypt(3): yes Built with multi-threading support. Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available multiplexer protocols : (protocols marked as cannot be specified using 'proto' keyword) h2 : mode=HTXside=FE|BE h2 : mode=HTTP side=FE : mode=HTXside=FE|BE : mode=TCP|HTTP side=FE|BE Available filters : [SPOE] spoe [COMP] compression [CACHE] cache [TRACE] trace Veiko
Re: [PATCH v2 1/2] MINOR: systemd: Use the variables from /etc/default/haproxy
On 2019-05-06 11:00, Tim Duesterhus wrote: From: Apollon Oikonomopoulos This will allow seamless upgrades from the sysvinit system while respecting any changes the users may have made. It will also make local configuration easier than overriding the systemd unit file. Note by Tim: This GPL-2 licensed patch was taken from the Debian project at [1]. It was slightly modified to cleanly apply, because HAProxy's default unit file does not include rsyslog.service as an 'After' dependency. Also the subject line was modified to include the proper subsystem and severity. I think, instead of After=rsyslog.service, it should be After=syslog.service, then any logger daemon could be used that has Alias=syslog.service. https://www.freedesktop.org/wiki/Software/systemd/syslog/ Regards, Veiko
Re: Early connection close, incomplete transfers
On 2019-02-19 06:47, Willy Tarreau wrote: This is interesting. As you observed in the trace you sent me, the lighttpd server closes just after sending the response headers. This indeed matches the "SD" log that aproxy emits. If it doesn't happen in TCP mode nor with Nginx, it means that something haproxy modifies in the request causes this effect on the server. Hi I'm sending answer from colleague who investigated this more thoroughly, especially from lighttpd side: we've been debugging this a bit further and it does not look like the issue with the seemingly random incomplete HTTP responses would be due to any particular request headers at the HTTP layer. It rather looks like something at the TCP level (so specific to HTTP mode): A first observation we made is that the frequency of these incomplete transfers increases when we add a delay at the backend server after sending the response headers and before sending the body data. We added a 100 ms delay there and then got a lot of interrupted transfers that had only received the response headers (= no delay) but 0 bytes of the body (= which was sent just after delay). So the frequency with which this happens appears to be proportional to latencies/stalls in the backend server sending the response data (some read timeout logic at haproxy??). We debugged further and noticed that in all cases where transfers were incomplete our lighttpd backend server was receiving an EPOLLRDHUP event on the socket where it communicates with haproxy. So it appears as if haproxy is *sometimes* (apparently depending on some read latency/stall - see above) shutting down its socket with the backend for writing *before* the full response and body data has been received. And this is also basically ok because the socket remains writeable for lighttpd and so it could still send down the rest of the response data. However, it looks like lighttpd is not expecting this kind of behavior from the client and is not correctly handling such a half-closed TCP session. There is code in lighttpd to handle such a EPOLLRDHUP event and half-closed TCP connection, but lighttpd then also checks the state of the TCP session with getsockopts and keeps the connection open *only* when the state is TCP_CLOSE_WAIT. In all other cases upon receiving the EPOLLRDHUP it actively changes the state of the connection to "ERROR" and then closes the connection: https://github.com/lighttpd/lighttpd1.4/blob/master/src/connections.c#L908 https://github.com/lighttpd/lighttpd1.4/blob/master/src/fdevent.c#L995 We checked and every time we have a incomplete response lighttpd receives the EPOLLRDHUP event on the socket but the tcp state queried via getsockopts is always TCP_CLOSE (and not TCP_CLOSE_WAIT as lighttpd seems to expect). And because of this lighttpd then actively closes the half-closed connection also from its end (which likely is the cause of the TCP FIN sent by lighttpd as seen in the tcpdump). When we remove this condition from lighttpd which marks the connection as errorness in case of EPOLLRDHUP and tcp state != TCP_CLOSE_WAIT, then the problem with the incomplete transfers disappears: https://github.com/lighttpd/lighttpd1.4/blob/master/src/connections.c#L922 We do not understand why this is or what the correct reaction to the EPOLLRDHUP event should be. In particular, we do not understand why lighttpd performs this check for TCP_CLOSE_WAIT or why we always get a state of TCP_CLOSE when we receive this event but the socket still continues to be writeable (so does the TCP_CLOSE just indicate that one direction of the connection is closed??). Still, because this half-closing of the connection to the backed server appears to happen just pretty randomly and depending on latency/stalls of the backend server sending down the response data, we assume that this is not the intended behavior by haproxy (and so possibly indicates some bug in haproxy too). We assume that the reason why direct requests to the backend server or requests proxied via Nginx did never fail is because in these cases there never occurs the EPOLLRDHUP event and there never are half-closed connections. However, we have not tested this (yet), so we did not re-test with Nginx to verify that then indeed lighttpd never sees a EPOLLRDHUP. Any ideas or suggestions based on these findings what should be the proper solution to the problem? Thank you.
Re: Early connection close, incomplete transfers
On 2019-02-14 18:29, Aleksandar Lazic wrote: Replaced HAproxy with Nginx for testing and with Nginx, not a single connection was interrupted, did millions of requests. In 1.9.4 are a lot of fixed added. please can you try your tests with 1.9.4, thanks. Already did before writing my previous letter. No differencies. Veiko
Re: Early connection close, incomplete transfers
On 2019-02-01 13:30, Veiko Kukk wrote: On 2019-02-01 12:34, Aleksandar Lazic wrote: Do you have any errors in lighthttpds log? Yes, it has error messages about not being enable to write to socket. Unrecoverable error writing to socket! errno 32, retries 12, ppoll return 1, send return -1 ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1 I've tested with several hundred thoused requests, but it never happens when using "mode tcp". Replaced HAproxy with Nginx for testing and with Nginx, not a single connection was interrupted, did millions of requests. Veiko
Re: Early connection close, incomplete transfers
On 2019-02-01 17:02, Willy Tarreau wrote: Hi Veiko, Are you certain that 1.9 and 1.7 have the same issue ? I mean, you could be observing two different cases looking similar. If you're sure it's the same issue, it could rule out a number of parts that differ quite a lot between the two (idle conns etc). I'm sure it happens with all versions we have tried: 1.6, 1.7, 1.9 (did not try 1.8, because we have never used it in production and decided to switch directly to 1.9), but how could we make sure it's caused by something different between versions if we observe very similar results. Since it's happening at random, it's hard to judge if there is slight change in one or another direction. Logs look same for all versions. Only 'mode tcp' helps to get rid of those errors. Do you know if the responses headers are properly delivered to the client when this happens ? And do they match what you expected ? Maybe the contents are sometimes invalid and the response rejected by haproxy, in which case a 502 would be returned to the client. When this happens, emitting "show errors" on the CLI will report it. I don't know, don't know about headers, don't have good tool to capture headers for failed connections only. Any suggestions? echo "show errors" |/usr/bin/socat /var/run/haproxy.sock.stats1 stdio Total events captured on [04/Feb/2019:13:46:33.167] : 0 Could you also check if this happens only/more with keep-alive, close or server-close ? I have seen no difference, unfortunately. If you can run more tests in your test environment, I'd be interested in seeing how latest 2.0-dev works with these variants : Tested with http://www.haproxy.org/download/2.0/src/snapshot/haproxy-ss-20190204.tar.gz - http-reuse never No difference, lot's of incomplete transfers. - http-reuse always No difference, lot's of incomplete transfers. - option httpclose No difference, lot's of incomplete transfers. - option http-server-close No difference, lot's of incomplete transfers. - option keep-alive I assume you meant 'option http-keep-alive' because there is no 'option keep-alive'. No difference, lot's of incomplete transfers. I'm asking for 2.0-dev because it's where all known bugs are fixed. If none of these settings has any effect, we'll have to look at network traces I'm afraid. Would you like to have network traffic dump? Regards, Veiko
Re: Early connection close, incomplete transfers
On 2019-02-01 12:34, Aleksandar Lazic wrote: Do you have any errors in lighthttpds log? Yes, it has error messages about not being enable to write to socket. Unrecoverable error writing to socket! errno 32, retries 12, ppoll return 1, send return -1 ERROR: Couldn't write header data to socket! desired: 4565 / actual: -1 I've tested with several hundred thoused requests, but it never happens when using "mode tcp". Regards, Veiko
Re: Early connection close, incomplete transfers
On 2019-01-31 12:57, Aleksandar Lazic wrote: Willy have found some issues which are added in the code of 2.0 tree. Do you have a chance to test this branch or do you want to wait for the next 1.9 release? I tested stable 1.9.3 and 1.9 preview version Willy gave link here https://www.mail-archive.com/haproxy@formilux.org/msg32678.html There is no difference in my tests. I'm not sure if it affects you as we haven't seen the config yet. Maybe you can share your config also so that we can see if your setup could be effected. Commented timeouts are original timeouts, I had increased those to make sure, I'm not hitting any timeouts when creating higher load with tests. Maxconn values serve the same purpose. global log /dev/log local0 daemon nbproc 1 nbthread 16 maxconn user haproxy spread-checks 5 tune.ssl.default-dh-param 2048 ssl-default-bind-options no-sslv3 no-tls-tickets ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS ssl-default-server-options no-sslv3 no-tls-tickets ssl-default-server-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES128-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:AES128-GCM-SHA256:AES128-SHA256:AES128-SHA:!DSS tune.ssl.cachesize 10 tune.ssl.lifetime 1800 stats socket /var/run/haproxy.sock.stats1 mode 640 group vault process 1 level admin defaults log global mode http option httplog option contstats option log-health-checks retries 5 #timeout http-request 5s timeout http-request 99s #timeout http-keep-alive 20s timeout http-keep-alive 99s #timeout connect 10s timeout connect 99s #timeout client 30s timeout client 99s timeout server 120s #timeout client-fin 10s timeout client-fin 99s #timeout server-fin 10s timeout server-fin 99s listen main_frontend bind *:443 ssl crt /etc/vault/cert.pem crt /etc/letsencrypt/certs/ maxconn bind *:80 maxconn option forwardfor acl local_lighty_down nbsrv(lighty_load_balancer) lt 1 monitor-uri /load_balance_health monitor fail if local_lighty_down default_backend lighty_load_balancer backend lighty_load_balancer stats enable stats realm statistics http-response set-header Access-Control-Allow-Origin * option httpchk HEAD /dl/index.html server lighty0 127.0.0.1:9000 check maxconn fall 2 inter 15s rise 5 id 1 Test results httpress test output summary: 1 requests launched thread 3: 1000 connect, 1000 requests, 983 success, 17 fail, 6212668130 bytes, 449231 overhead thread 9: 996 connect, 996 requests, 979 success, 17 fail, 6187387690 bytes, 447403 overhead thread 4: 998 connect, 998 requests, 980 success, 18 fail, 6193707800 bytes, 447860 overhead thread 1: 1007 connect, 1007 requests, 988 success, 19 fail, 6244268680 bytes, 451516 overhead thread 8: 998 connect, 998 requests, 977 success, 21 fail, 6174747470 bytes, 446489 overhead thread 7: 1001 connect, 1001 requests, 970 success, 31 fail, 6130506700 bytes, 443290 overhead thread 10: 997 connect, 997 requests, 983 success, 14 fail, 6212668130 bytes, 449231 overhead thread 6: 1004 connect, 1004 requests, 986 success, 18 fail, 6231628460 bytes, 450602 overhead thread 5: 999 connect, 999 requests, 982 success, 17 fail, 6206348020 bytes, 448774 overhead thread 2: 1000 connect, 1000 requests, 981 success, 19 fail, 6200027910 bytes, 448317 overhead TOTALS: 1 connect, 1 requests, 9809 success, 191 fail, 100 (100) real concurrency TRAFFIC: 6320110 avg bytes, 457 avg overhead, 61993958990 bytes, 4482713 overhead TIMING: 81.014 seconds, 121 rps, 747335 kbps, 825.9 ms avg req time HAproxy log sections of incomplete transfers (6320535 bytes should be transferred with this test data set): 127.0.0.1:33054 [01/Feb/2019:11:22:48.178] main_frontend lighty_load_balancer/lighty0 0/0/0/0/298 200 425 - - SD-- 100/100/99/99/0 0/0 " 127.0.0.1:32820 [01/Feb/2019:11:22:48.068] main_frontend lighty_load_balancer/lighty0 0/0/0/0/409 200 4990 - - SD-- 99/99/98/98/0 0/0 " 127.0.0.1:34330 [01/Feb/2019:11:22:49.199] main_frontend lighty_load_balancer/lighty0 0/0/0/0/90 200 425 - - SD-- 100/100/99/99/0 0/0 " 127.0.0.1:34344 [01/Feb/2019:11:22:49.201] main_frontend lighty_load_balancer/lighty0 0/0/0/0/88 200 425 - - SD-- 99/99/98/98/0 0/0 " 127.0.0.1:34658 [01/Feb/2019:11:22:49.447] main_frontend lighty_load_balancer/lighty0 0/0/0/0/254 200 425 - - SD-- 100/100/98/98/0 0/0 " 127.0.0.1:34386 [01/Feb/2019:11:22:49.290] main_frontend lighty_load_balancer/lighty0 0/0/0/0/412 200 425 - - SD-- 100/100/98/98/0 0/0 " 127.0.0.1:34388 [01/Feb/2019:11:22:49.290] main_frontend
Early connection close, incomplete transfers
HAproxy 1.9.3, but happens also with 1.7.10, 1.7.11. Connections are getting closed during data transfer phase at random sizes on backend. Sometimes just as little as 420 bytes get transferred, but usually more is transferred before sudden end of connection. HAproxy logs have connection closing status SD-- when this happens. Basic components of system look like this: Client --> HAproxy --> HTTP server --> Caching Proxy --> Remote origin Our HTTP server part is compiling data from chunks it gets from local cache. When it receives request from client via HAproxy, it sends response header, then fetches chunks, compiles those and sends data client. SD-- happens more frequently when connection between benchmarking tool and HAproxy is fast, e.g. when doing tests where client side is not loaded much. Happens much more for http than for https. For example: httpress -t1 -c10 -n1000 URL (rarely or not at all) 250 requests launched 500 requests launched 750 requests launched 1000 requests launched TOTALS: 1000 connect, 1000 requests, 1000 success, 0 fail, 10 (10) real concurrency TRAFFIC: 667959622 avg bytes, 452 avg overhead, 667959622000 bytes, 452000 overhead TIMING: 241.023 seconds, 4 rps, 2706393 kbps, 2410.2 ms avg req time httpress -t10 -c10 -n1000 URL (happens frequently) 2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:15 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:16 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:17 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection closed 2019-01-31 08:44:18 [26361:0x7fdc91a23700]: body [0] read connection closed 1000 requests launched 2019-01-31 08:44:19 [26361:0x7fdc82ffd700]: body [0] read connection closed thread 6: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 bytes, 32544 overhead thread 10: 72 connect, 72 requests, 72 success, 0 fail, 48093092784 bytes, 32544 overhead thread 7: 73 connect, 73 requests, 72 success, 1 fail, 48093092784 bytes, 32544 overhead thread 4: 88 connect, 88 requests, 67 success, 21 fail, 44753294674 bytes, 30284 overhead thread 9: 111 connect, 111 requests, 56 success, 55 fail, 37405738832 bytes, 25312 overhead thread 5: 82 connect, 82 requests, 68 success, 14 fail, 45421254296 bytes, 30736 overhead thread 1: 86 connect, 86 requests, 68 success, 18 fail, 45421254296 bytes, 30736 overhead thread 8: 184 connect, 184 requests, 29 success, 155 fail, 19370829038 bytes, 13108 overhead thread 3: 73 connect, 73 requests, 73 success, 0 fail, 48761052406 bytes, 32996 overhead thread 2: 158 connect, 158 requests, 39 success, 119 fail, 26050425258 bytes, 17628 overhead TOTALS: 1000 connect, 1000 requests, 616 success, 384 fail, 10 (10) real concurrency TRAFFIC: 667959622 avg bytes, 452 avg overhead, 411463127152 bytes, 278432 overhead TIMING: 170.990 seconds, 3 rps, 2349959 kbps, 2775.8 ms avg req time Because of thread count differences, -t1 (one thread) test is much more loaded on client side than it is with -t10 (ten threads). Random samples from HAproxy log (proper size of the object in HAproxy logs is 667960042 bytes for that test file). 0/0/0/0/903 200 270807819 - - SD-- 10/10/9/9/0 0/0 0/0/0/0/375 200 101926854 - - SD-- 10/10/9/9/0 0/0 0/0/0/0/725 200 243340623 - - SD-- 10/10/9/9/0 0/0 0/0/0/0/574 200 183069594 - - SD-- 11/11/9/9/0 0/0 0/0/0/0/648 200 208194175 - - SD-- 10/10/9/9/0 0/0 0/0/0/0/1130 200 270807819 - - SD-- 10/10/9/9/0 0/0 0/0/0/0/349 200 90597175 - - SD-- 10/10/9/9/0 0/0 Our HTTP server logs contain hard unrecoverable errors about unable to write to socket when HAproxy closes connection: Return Code: 32. Transferred 79389313 out of 667959622 Bytes in 809 msec Return Code: 32. Transferred 198965568 out of 667959622 Bytes in 986 msec Return Code: 32. Transferred 126690257 out of 667959622 Bytes in 825 msec Return Code: 32. Transferred 270807399 out of 667959622 Bytes in 1273 msec Return Code: 32. Transferred 171663764 out of 667959622 Bytes in 1075 msec Return Code: 32. Transferred 169362556 out of 667959622 Bytes in 1146 msec Return Code: 32. Transferred 167789692 out of 667959622 Bytes in 937 msec Return Code: 32. Transferred 199752000 out of 667959622 Bytes in 1110 msec Return Code: 32. Transferred 158793496 out of 667959622 Bytes in 979 msec Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1087 msec Return Code: 32. Transferred 139962654 out of 667959622 Bytes in 918 msec Return Code: 32. Transferred 155690998 out of 667959622 Bytes in 977 msec Return Code: 32. Transferred 240394573 out of 667959622 Bytes in 1079 msec Return Code: 32. Transferred 177068702 out of 667959622 Bytes in 1060 msec Return Code: 32. Transferred 119149343 out of 667959622 Bytes in 881 msec Return Code: 32.
Re: HA Proxy Load Balancer
On 2018-12-20 20:41, Lance Melancon wrote: Thanks for the info. Unfortunately I am not a programmer by a long shot and syntax is a big problem for me. I tried a few things but no luck and I can't find any examples of a redirect. So do I need both the backend and acl statements? I'm simply trying to use mysite.net to direct to mysite.net/website. Any time I use a / the config fails. Maybe this will help you http://www.catb.org/esr/faqs/smart-questions.html Veiko
Re: 1.7.11 with gzip compression serves incomplete files
Hi, Willy On 2018-12-06 04:43, Willy Tarreau wrote: In the mean time it would be useful to see if adding "option http-pretend-keepalive" helps. This way we'll know if it's the server closing first or haproxy closing first which triggers this. And if it turns out that it fixes the issue for you, it could be a good temporary workaround. Indeed, adding "option http-pretend-keepalive" helps. Veiko
Re: 1.7.11 with gzip compression serves incomplete files
On 2018-11-30 09:40, Christopher Faulet wrote: Now, I'm still puzzled with this issue. Because I can't reproduce it for now. And it is even more strange because when the compression is enabled and used on a response, it cannot be switched in TUNNEL mode. So I don't really understand how the patch you mentioned could fix a compression bug, or the commit 8066ccd39 (as stated on discourse) could be the origin of the bug. I've found that 'option http-server-close' in frontend is causing this. Commenting it out and 1.7.11 is working fine with gzip compression. I'm not gathering more debug data at the moment, maybe this already helps to reproduce the issue. I will provide more if necessary. Finally, I have a last question. You said the result is truncated. It means the response is truncated because of a close and not all chunks are received ? Or the response is correct from the HTTP point of view, but the file is truncated once uncompressed ? Uncompressed file is truncated, part of it at the end is missing. Not sure what do you mean by correctness from HTTP point of view, but the headers look fine. GET /assets/js/piwik.js?v=1534 HTTP/1.1 Host: foobar.tld User-Agent: curl/7.61.1 Accept: */* Accept-Encoding: deflate, gzip, br { [5 bytes data] < HTTP/1.1 200 OK < Date: Wed, 05 Dec 2018 15:45:26 GMT < Content-Encoding: gzip < Content-Language: en < Content-Location: http://foobar.tld/assets/js/piwik.js?v=1534 < Content-Type: application/x-javascript; charset=UTF-8 < Expires: Sat, 02 Dec 2028 15:45:26 GMT < Last-Modified: Fri, 23 Nov 2018 15:51:00 GMT < Cache-Control: max-age=31536 < Date: Wed, 05 Dec 2018 15:45:26 GMT < Accept-Ranges: bytes < Server: Restlet-Framework/2.3.9 < Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept < Connection: close 65236 Dec 5 16:29 piwik.js (without option http-server-close) 41440 Dec 5 17:45 piwik.js_ (with option http-server-close) Regards, Veiko
1.7.11 with gzip compression serves incomplete files
Hi! There is not much to add, just that it's been broken before in 1.7.9 and is again broken in 1.7.11. Works with 1.7.10. When applying patch provided here https://www.mail-archive.com/haproxy@formilux.org/msg27155.html 1.7.11 also works. Testing is really simple, just configure haproxy gzip compression and download with curl --compression or with web browser. Sample .js file I have downloaded has real size 42202 bytes but when downloading with gzip compression, it's size is 37648 bytes - part of the end is missing. Very similar issue is discussed here too https://discourse.haproxy.org/t/1-7-11-compression-issue-parsing-errors-on-response/2542 Best regards, Veiko
Understanding certain balance configuration
Hi, I'm trying to understand how balance url_param hash-type consistent should work. Haproxy 1.7.11. Lets say, we have a config of two haproxy instances that balance content between local and remote (sibling). server0 (10.0.0.1) would have config section like this: backend load_balancer balance url_param file_id hash-type consistent server local_backend /path/to/socket id 1 server remote_backend 10.0.0.2:80 id 2 backend local_backend balance url_param file_id hash-type consistent server server0 127.0.0.1:100 server server1 127.0.0.1:200 server1 (10.0.0.2) would have config section like this: backend load_balancer balance url_param file_id hash-type consistent server local_backend /path/to/socket id 2 server remote_backend 10.0.0.1:80 id 1 backend local_backend balance url_param file_id hash-type consistent server server0 127.0.0.1:100 server server1 127.0.0.1:200 Assuming that all requests indeed have URL parameter "file_id", should requests on both servers only reach single "local_backend" server since they are already balanced and are not anymore divided in "local_backend" because of identical configuration on both "load_balancer" and "local_backend"? thanks in advance, Veiko
Re: force-persist and use_server combined
On 07/25/2018 03:05 PM, Veiko Kukk wrote: The idea here is that HAproxy statistics page, some other backend statistics and also some remote health checks running against path under /dl/ would always reach only local_http_frontend, never go anywhere else even when local really is down, not just marked as down. This config does not work, it forwards /haproxy?stats request to remote_http_frontend when local_http_frontend is really down. Is it expected? Any ways to overcome this limitation? I wonder if my question was too stupid or was just left unnoticed by someone who knows how force-persist is supposed to be working. Meanwhile I've created workaround by adding additional config sections and having use_backend ACL instead of use_server ACL to achieve what was needed. regards, Veiko
force-persist and use_server combined
Hi, I'd like to understand if I've made a mistake in configuration or there might be a bug in HAproxy 1.7.11. defaults section has "option redispatch". backend load_balancer mode http option httplog option httpchk HEAD /load_balance_health HTTP/1.1\r\nHost:\ foo.bar balance url_param file_id hash-type consistent acl status0 path_beg -i /dl/ acl status1 path_beg -i /haproxy use-server local_http_frontend if status0 or status1 force-persist if status0 or status1 server local_http_frontend /var/run/haproxy.sock.http-frontend check send-proxy server remote_http_frontend 192.168.1.52:8080 check send-proxy The idea here is that HAproxy statistics page, some other backend statistics and also some remote health checks running against path under /dl/ would always reach only local_http_frontend, never go anywhere else even when local really is down, not just marked as down. This config does not work, it forwards /haproxy?stats request to remote_http_frontend when local_http_frontend is really down. Is it expected? Any ways to overcome this limitation? Thanks in advance, Veiko
Re: Truly seamless reloads
On 31/05/18 23:15, William Lallemand wrote: Sorry but unfortunately we are not backporting features in stable branches, those are only meant for maintenance. People who want to use the seamless reload should migrate to HAProxy 1.8, the stable team won't support this feature in previous branches. I've been keeping eye on this list about 1.8 related bugs and it does not seem to me that 1.8 stable enough yet for production use. Too many reports about high CPU usage and/or crashes. We are still using 1.6 which finally seems to have stabilized enough for production. When we started using 1.6 some years ago, we had many issues with it which caused service interruptions. Would not want to repeat that again. Even with 1.7, processes would hang forever after reload (days, sometimes weeks or until reboot). Really hard to debug, happens only under production load. I will look at patches provided by Dave. We are building HAproxy rpm-s for ourselves anyway, applying some patches in spec file does not seem to be that much additional work if indeed those would provide truly seamless reloads. Best regards, Veiko
Re: Truly seamless reloads
On 26/04/18 17:11, Veiko Kukk wrote: Hi, According to https://www.haproxy.com/blog/truly-seamless-reloads-with-haproxy-no-more-hacks/ : "The patchset has already been merged into the HAProxy 1.8 development branch and will soon be backported to HAProxy Enterprise Edition 1.7r1 and possibly 1.6r2." Has it been backported to 1.7 and/or 1.6? If yes, then should seamless reload also work with multiprocess configurations? (nbproc > 1). Can i assume the answer is no for both questions? Veiko
Truly seamless reloads
Hi, According to https://www.haproxy.com/blog/truly-seamless-reloads-with-haproxy-no-more-hacks/ : "The patchset has already been merged into the HAProxy 1.8 development branch and will soon be backported to HAProxy Enterprise Edition 1.7r1 and possibly 1.6r2." Has it been backported to 1.7 and/or 1.6? If yes, then should seamless reload also work with multiprocess configurations? (nbproc > 1). Thanks in advance, Veiko
Re: 1.7.10 and 1.6.14 always compress response
On 04/10/2018 03:51 PM, William Lallemand wrote: On Tue, Apr 10, 2018 at 03:43:12PM +0300, Veiko Kukk wrote: Hi, Hi, This happens even when either compression algo nor compression type are specified in haproxy configuration file. If you didn't specify any compression keyword in the haproxy configuration file, that's probably your backend server which is doing the compression. Actually, you are right. What is suprising, is that in case of requesting non-compressed from haproxy, it still passes through compressed data. Maybe that's how standard specifies, I don't know. Thanks, Veiko
1.7.10 and 1.6.14 always compress response
Hi, Lets run simple query against host (real hostnames replaced). curl https://testhost01.tld -o /dev/null -vvv Request headers: > GET / HTTP/1.1 > Host: testhost01.tld > User-Agent: curl/7.58.0 > Accept: */* Response headers: < HTTP/1.1 200 OK < Date: Tue, 10 Apr 2018 12:23:44 GMT < Content-Encoding: gzip < Content-Type: text/html;charset=utf-8 < Cache-Control: no-cache < Date: Tue, 10 Apr 2018 12:23:44 GMT < Accept-Ranges: bytes < Server: Restlet-Framework/2.3.4 < Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept < Connection: close < Access-Control-Allow-Origin: * < Strict-Transport-Security: max-age=15768000 This happens even when either compression algo nor compression type are specified in haproxy configuration file. But lets say during request that we don't want any compression: curl https://testhost01.tld -H "Accept-Encoding: identity" -o /dev/null -vvv Request headers: > GET / HTTP/1.1 > Host: testhost01.tld > User-Agent: curl/7.58.0 > Accept: */* > Accept-Encoding: identity Response headers: < HTTP/1.1 200 OK < Date: Tue, 10 Apr 2018 12:40:25 GMT < Content-Encoding: gzip < Content-Type: text/html;charset=utf-8 < Cache-Control: no-cache < Date: Tue, 10 Apr 2018 12:40:25 GMT < Accept-Ranges: bytes < Server: Restlet-Framework/2.3.4 < Vary: Accept-Charset, Accept-Encoding, Accept-Language, Accept < Connection: close < Access-Control-Allow-Origin: * < Strict-Transport-Security: max-age=15768000 Still, response is gzipped. HA-Proxy version 1.6.14-66af4a1 2018/01/02 Copyright 2000-2018 Willy TarreauBuild options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.3 Running on zlib version : 1.2.3 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 Running on PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with Lua version : Lua 5.3.4 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. HA-Proxy version 1.6.14-66af4a1 2018/01/02 Copyright 2000-2018 Willy Tarreau Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.3 Running on zlib version : 1.2.3 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 Running on PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with Lua version : Lua 5.3.4 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll.
Re: Logging errors during reload of haproxy
Hi Lukas, On 11/03/2017 02:53 PM, Lukas Tribus wrote: # service haproxy reload [ALERT] 306/110738 (29225) : sendmsg logger #1 failed: Resource temporarily unavailable (errno=11) Well the destination logging socket is unavailable. I don't think there is a lot to do here on the haproxy side, this mostly depends on the destination socket and the kernel. I would suggest you use a UDP destination instead. That should be better suited to handle logging at this rate. This is a test system with not much load other than my little 'ab -c 10 ...' is creating. We have unix logging everywhere locally, works even under heavy load. First i suspected change in config where i added 'log /dev/log local0', but after commenting that, those messages are still appear. Once per process after reload, every time when doing quick reloads e.g. for i in {1..10}; do service haproxy reload; done But sometimes even when not restarting quickly. I have cronjob that runs after every 3 minutes and reloads haproxy, then this error appears sometimes, not each time. another bug about processes never closing after reload Unless you are hitting a bug already fixed (make sure you use a current stable release), it's likely that long running sessions keep haproxy running. Use the hard-stop-after directive to limit the time haproxy spends in this state: https://cbonte.github.io/haproxy-dconv/1.7/configuration.html#3.1-hard-stop-after I would not comment hanging process bug more under this thread, because it's off topic. Will create new thread for that. Planned anyway, but wanted first create reproduction instructions. So far, it's quite random... Regards, Veiko
Re: Logging errors during reload of haproxy
On 11/03/2017 01:21 PM, Veiko Kukk wrote: Hi, I noticed, while trying to reproduce conditions for another bug about processes never closing after restart, that sometimes reload causes logging errors displayed. Should read here "never closing after *reload*". Veiko
Logging errors during reload of haproxy
Hi, I noticed, while trying to reproduce conditions for another bug about processes never closing after restart, that sometimes reload causes logging errors displayed. Following config section might be relevant: global log /dev/log local0 nbproc 3 defaults log /dev/log local0 frontend foo log /dev/log local1 ... # service haproxy reload [ALERT] 306/110738 (29225) : sendmsg logger #1 failed: Resource temporarily unavailable (errno=11) [ALERT] 306/110738 (29225) : sendmsg logger #1 failed: Resource temporarily unavailable (errno=11) [ALERT] 306/110738 (29225) : sendmsg logger #1 failed: Resource temporarily unavailable (errno=11) # haproxy -vv HA-Proxy version 1.7.9 2017/08/18 Copyright 2000-2017 Willy TarreauBuild options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_STATIC_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.3 Running on zlib version : 1.2.3 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 Running on PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with Lua version : Lua 5.3.4 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Available filters : [SPOE] spoe [TRACE] trace [COMP] compression Veiko
Re: Possible regression in 1.6.12
Hi, Willy On 16/06/17 12:15, Willy Tarreau wrote: So I have more info on this now. Veiko, first, I'm assuming that your config was using "resolvers dns_resolvers" on the "server" line, otherwise resolvers are not used. My real world configs use resolvers, but timeouts happen even when resolver was not used anywhere. It is why I did not include resolvers in the example config backend server provided with initial report e-mail. When keeping only single server under resolvers section, I did not notice any timeouts. And it did not matter whether that single server was local or Google. What I've seen when running your config here is that google responds both in IPv4 and IPv6. And depending on your local network settings, if you can't reach them over IPv6 after the address was updated, your connection might get stuck waiting for the connect timeout to strike (10s in your conf, multiplied by the number of retries). The way to address this is to add "resolve-prefer ipv4" at the end of your server line, it will always pick IPv4 addresses only. We have 'resolve-prefer ipv4' enabled in real world configuration when resolver is used actually on 'server' line. We have disabled IPv6 for all our servers. Anyway - since timeouts happen even without using resolver anywhere, this must not be the cause of timeouts. BTW, (probably that it was just for illustration purpose), but please don't use well-known services like google, yahoo or whatever for health checks. If everyone does this, it will add a huge useless load to their servers. It was just that anybody could use simple trimmed down configuration for quick testing. Real configuration has no need for having google.com as backend and is much more complex. This exact configuration can be easily used to test 1.6.12 - a simple reload would cause two first google.com checks to fail with timouts. Also any requests against ssl-frontend will fail few first checks after reload. Regards, Veiko
Re: Possible regression in 1.6.12
On 14/06/17 17:37, Willy Tarreau wrote: Could you try to revert the attached patch which was backported to 1.6 to fix an issue where nbproc and resolvers were incompatible ? To do that, please use "patch -Rp1 < foo.patch". I have applied the patch. Now HAproxy working as in 1.6.11 version, no requests time out. Also, have you noticed if your haproxy continues to work or if it loops at 100% CPU for example ? No, there is no excessive CPU load. Best regards, Veiko
Possible regression in 1.6.12
Possible regression in 1.6.12 I might have discovered a haproxy bug. It occurs when all of the following configuration conditions are satisfied: * haproxy version 1.6.12 * multiple processes * resolvers section with more than one server configured (not even used anywhere) * haproxy is either reloaded or restarted * request is made against freshly reloaded/restarted haproxy or haproxy backend server health check is made. Both cases requests do not get response. When accessing haproxy, requests time out. Backends will fail checks and are marked as down with timeout error. Happens with browsers, curl, wget. When downgrading to 1.6.11, timeouts don't happen. How I tested: 1) reload haproxy with the minimal config below 2) then run: for i in {1..100}; do date --utc; echo $i; curl https://tsthost.tld/haproxy?stats -o /dev/null -s -m 50; done Wed 14 Jun 11:45:44 UTC 2017 1 Wed 14 Jun 11:46:34 UTC 2017 2 Wed 14 Jun 11:47:24 UTC 2017 3 Wed 14 Jun 11:48:14 UTC 2017 4 Wed 14 Jun 11:48:14 UTC 2017 5 Wed 14 Jun 11:49:04 UTC 2017 6 Wed 14 Jun 11:49:05 UTC 2017 7 Wed 14 Jun 11:49:55 UTC 2017 8 Wed 14 Jun 11:49:55 UTC 2017 9 Wed 14 Jun 11:50:45 UTC 2017 10 Wed 14 Jun 11:50:46 UTC 2017 11 Wed 14 Jun 11:50:46 UTC 2017 12 Wed 14 Jun 11:50:46 UTC 2017 When removing either multiprocess configuration or resolvers section, no requests time out. Following is trimmed down minimal config: global daemon nbproc 3 maxconn 500 user haproxy tune.ssl.default-dh-param 2048 ssl-default-bind-options no-sslv3 no-tls-tickets ssl-default-bind-ciphers AES128+EECDH:AES128+EDH:!ADH:!AECDH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK ssl-default-server-options no-sslv3 no-tls-tickets ssl-default-server-ciphers AES128+EECDH:AES128+EDH:!ADH:!AECDH:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!3DES:!MD5:!PSK stats socket /var/run/haproxy1.sock mode 600 process 1 stats socket /var/run/haproxy2.sock mode 600 process 2 stats socket /var/run/haproxy3.sock mode 600 process 3 defaults bind-process 3 log /dev/log local0 option log-health-checks option contstats timeout connect 10s timeout client 60s timeout server 60s resolvers dns_resolvers # local caching named nameserver dns0 127.0.0.1:53 # remote servers nameserver dns1 8.8.8.8:53 nameserver dns2 8.8.4.4:53 listen ssl-frontend bind-process 1-2 bind *:443 ssl crt /path/to/certificate.pem server http-frontend 127.0.0.1:666 send-proxy check frontend http-frontend mode http stats enable option forwardfor option httplog bind *:80 bind 127.0.0.1:666 accept-proxy backend ssl_backend mode http option httplog server ssl_server google.com:443 check ssl verify none fall 2 inter 5s fastinter 3s rise 3 HA-Proxy version 1.6.12 2017/04/04 Copyright 2000-2017 Willy TarreauBuild options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_LUA=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.3 Running on zlib version : 1.2.7 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 Running on PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built with Lua version : Lua 5.3.3 Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll.
Re: Multiple url parameter based session limiting
On 04/10/16 18:21, Veiko Kukk wrote: Lets say, we have URL http://domain.tld?foo=abc=def. I'd like to have current session limiting with sticky tables when both foo and bar values match, but I'm not sure how to achieve this (in most optimal way). I found similar post from 2013. Not exactly what I need, but similar by the means that it also requires matching several query parameters. https://www.mail-archive.com/haproxy@formilux.org/msg11680.html Are the per-request variables now awailable and how to use them? Regards, Veiko
Multiple url parameter based session limiting
Hi, Lets say, we have URL http://domain.tld?foo=abc=def. I'd like to have current session limiting with sticky tables when both foo and bar values match, but I'm not sure how to achieve this (in most optimal way). Sticky tables are somewhat hard to understand for me. stick-table type string len 48 size 1m expire 90m store conn_cur tcp-request inspect-delay 2s tcp-request content track-sc0 urlp(foo) if HTTP This only adds url parameter foo, but I'd like to add something like urlp(foo and bar) which is not possible according to documentation. Any suggestions how to accomplish this? Regards, Veiko
Re: Haproxy 1.6.9 failed to compile regex
On 07/09/16 14:37, Veiko Kukk wrote: I tried to upgrade from 1.6.8 to 1.6.9, but found strange errors printed by haproxy 1.6.9. Any ideas, why? Another strange issue is that 1.6.9 shows: Running on OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010 System does have openssl 1.0.1e-48.el6_8.1 installed and nothing else. So how is it possible that it's using different version than system has? On the other hand - 1.6.8 reports proper openssl version: Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Veiko
Haproxy 1.6.9 failed to compile regex
Hi, I tried to upgrade from 1.6.8 to 1.6.9, but found strange errors printed by haproxy 1.6.9. Any ideas, why? [ALERT] 250/112901 (12026) : parsing [/etc/haproxy/haproxy.cfg:57] : 'reqirep' : regular expression '^([^ :]*) /(.*)' : failed to compile regex '^([^ :]*) /(.*)' (error=unknown or incorrect option bit(s) set) [ALERT] 250/112901 (12026) : parsing [/etc/haproxy/haproxy.cfg:205] : 'reqidel' : regular expression '^If-Match:.*' : failed to compile regex '^If-Match:.*' (error=unknown or incorrect option bit(s) set) [ALERT] 250/112901 (12026) : parsing [/etc/haproxy/haproxy.cfg:279] : 'rspidel' : regular expression '^Content-Location' : failed to compile regex '^Content-Location' (error=unknown or incorrect option bit(s) set) Downgrading to 1.6.8 solves this error. # haproxy -vv HA-Proxy version 1.6.9 2016/08/30 Copyright 2000-2016 Willy TarreauBuild options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.7 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.0-fips 29 Mar 2010 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built without Lua support Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. # haproxy -vv HA-Proxy version 1.6.8 2016/08/14 Copyright 2000-2016 Willy Tarreau Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.3 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built without Lua support Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Veiko
Re: 100% cpu , epoll_wait()
On 18/05/16 15:42, Willy Tarreau wrote: Hi Sebastian, On Thu, May 12, 2016 at 09:58:22AM +0200, Sebastian Heid wrote: Hi Lukas, starting from around 200mbit/s in, haproxy processes (nbproc 6) are hitting 100% cpu regularly (noticed up to 3 processes at the same time with 100%), but recover again on its own after some time. stracing such a process yesterday showed the following: epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 Unfortunately I can't do any more debugging in this setup. HAproxy 1.5.14 is never near to 10% cpu usage with way higher bandwidth. So far I've got good reports from people having experienced similar issues with recent versions, thus I'm thinking about something, are you certain that you did a make clean after upgrading and before rebuilding ? Sometimes we tend to forget it, especially after a simple "git pull". It is very possible that some old .o files were not properly rebuilt and still contain these bugs. If in doubt, you can simply keep a copy of your latest haproxy binary, make clean, build again and run cmp between them. It should not report any difference otherwise it means there was an issue (which would be a great news). I can confirm that on CentOS 6 with HAproxy 1.6.5 this 100% CPU load still happens. Exactly the same: epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, ^CProcess 6200 detached # haproxy -vv HA-Proxy version 1.6.5 2016/05/10 Copyright 2000-2016 Willy TarreauBuild options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.3 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built without Lua support Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Veiko
Re: 100% cpu , epoll_wait()
On 20/04/16 11:43, Willy Tarreau wrote: On Tue, Apr 19, 2016 at 09:53:36PM +0300, Veiko Kukk wrote: On 19/04/16 18:52, Willy Tarreau wrote: On Tue, Apr 19, 2016 at 04:15:08PM +0200, Willy Tarreau wrote: OK in fact it's different. Above we have a busy polling loop, which may very be caused by the buffer space miscalculation bug and which results in a process not completing its job until a timeout strikes. The link to the other report shows a normal polling with blocked signals. The processes that was created yesterday via soft reload, went 100% cpu today. haproxy 29388 5.0 0.0 58772 11700 ?Rs Apr17 156:44 /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf 1997 Section from strace output: (...) this below : epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 is not good unfortunately. I'm assuming this is with 1.5.17, that's it ? If so we still have an issue :-/ It is 1.6.3 # haproxy -vv HA-Proxy version 1.6.3 2015/12/25 Copyright 2000-2015 Willy Tarreau <wi...@haproxy.org> Build options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.3 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built without Lua support Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Veiko
Re: 100% cpu , epoll_wait()
On 19/04/16 18:52, Willy Tarreau wrote: On Tue, Apr 19, 2016 at 04:15:08PM +0200, Willy Tarreau wrote: OK in fact it's different. Above we have a busy polling loop, which may very be caused by the buffer space miscalculation bug and which results in a process not completing its job until a timeout strikes. The link to the other report shows a normal polling with blocked signals. The processes that was created yesterday via soft reload, went 100% cpu today. haproxy 29388 5.0 0.0 58772 11700 ?Rs Apr17 156:44 /usr/sbin/haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf 1997 Section from strace output: epoll_wait(0, {}, 200, 0) = 0 recvfrom(32, "\366\334\247\270<\230\3028\v\334\236K\204^p\31\6\3T\230:\23s\257\337\316\242\302]\2\246\227"..., 15368, 0, NULL, NULL) = 15368 recvfrom(32, "\366\334si\251\272Y\372\360'/\363\212\246\262w\307[\251\375\314\236whe\302\337\257\25NQ\370"..., 1024, 0, NULL, NULL) = 1024 sendto(18, "\366\334\247\270<\230\3028\v\334\236K\204^p\31\6\3T\230:\23s\257\337\316\242\302]\2\246\227"..., 16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 8016 sendto(18, "\355\265\207\360\357\3046k\364\320\330\30d\247\354\273BE\201\337\4\265#\357Z\231\231\337\365*\242\345"..., 8376, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(0, EPOLL_CTL_MOD, 18, {EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=18, u64=18}}) = 0 epoll_wait(0, {}, 200, 0) = 0 recvfrom(32, "@OR\224\335\233\263\347U\245X\376)\240\342\334\242\31\321\322\354\222\276\233\247\316-\263\370)\252U"..., 8016, 0, NULL, NULL) = 8016 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {{EPOLLOUT, {u32=53, u64=53}}}, 200, 0) = 1 sendto(53, "\274'[\24\n\264*b\306\253YA\313A\36\202a\177\317\370K:\302\230\315.\315\215\f&\351\27"..., 14032, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 14032 sendto(53, "\234CS\236wYsf\267\24\276v\325\302\267+a\303\336\250\211x\236\33\23MR_\324\214A\264"..., 2360, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 2360 recvfrom(55, "\231\16\35\337\20\203V\344\360\202n\307\2120\213\r\353\312\334\357\205\366=\\\373|\210\4-\354\32\360"..., 15368, 0, NULL, NULL) = 15368 recvfrom(55, "i\244\305N\242I\177n'4g\211\256%\26X\34il\3374\34HN\22\365\357\211Y\354\306K"..., 1024, 0, NULL, NULL) = 1024 sendto(53, "\231\16\35\337\20\203V\344\360\202n\307\2120\213\r\353\312\334\357\205\366=\\\373|\210\4-\354\32\360"..., 16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16392 epoll_ctl(0, EPOLL_CTL_MOD, 53, {EPOLLIN|EPOLLRDHUP, {u32=53, u64=53}}) = 0 epoll_wait(0, {}, 200, 0) = 0 recvfrom(55, "\365f\303r(\1\365S\276\246c\334\216\346\226\10<}\340\227h\374\370\360\276sSs\346\351\337\370"..., 15368, 0, NULL, NULL) = 15368 recvfrom(55, "-\r\21\326\326\0\0>\346-?\375\325J\346N\336\353Jz\376\303\373?\226y}\317\257\371\304t"..., 1024, 0, NULL, NULL) = 1024 sendto(53, "\365f\303r(\1\365S\276\246c\334\216\346\226\10<}\340\227h\374\370\360\276sSs\346\351\337\370"..., 16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16392 epoll_wait(0, {}, 200, 0) = 0 recvfrom(55, "\251\3\0200\317\217ab\223\f\306\322/}J\231\4\3b\311h\220sq\220[\225\21\372\264Dv"..., 15368, 0, NULL, NULL) = 15368 recvfrom(55, "\233.\20B\337\343\274\311\212\211\241\244\5\257\221w1{\253Kjh\23?w\357\365\377\335\261\3\215"..., 1024, 0, NULL, NULL) = 1024 sendto(53, "\251\3\0200\317\217ab\223\f\306\322/}J\231\4\3b\311h\220sq\220[\225\21\372\264Dv"..., 16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 16392 epoll_wait(0, {}, 200, 0) = 0 recvfrom(55, ",T\27\22\300\31\231t\207%j-\263}\344\25#\333\235\214*M\227\26\0215*_\312/@\351"..., 15368, 0, NULL, NULL) = 15368 recvfrom(55, "\225\256\37Qib\371\377\220l\342\20\2742\271\3360U\224\0375?ju\10\207\235J\267\35\340\367"..., 1024, 0, NULL, NULL) = 1024 sendto(53, ",T\27\22\300\31\231t\207%j-\263}\344\25#\333\235\214*M\227\26\0215*_\312/@\351"..., 16392, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = 13312 sendto(53, "\372\265\334\263\232\2016l2\216\372\261B\26\243\252\204\220\353\f\367\215\331\232\203hI,\260\37\207\357"..., 3080, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_MORE, NULL, 0) = -1 EAGAIN (Resource temporarily unavailable) epoll_ctl(0, EPOLL_CTL_MOD, 53, {EPOLLIN|EPOLLOUT|EPOLLRDHUP, {u32=53, u64=53}}) = 0 epoll_wait(0, {}, 200, 0) = 0 recvfrom(55, "k\33\342U\260:Z\350\3725>\211R@\20\347\326\363\203\36?\226\304\241\367\263B\242\230\6^\221"..., 13312, 0, NULL, NULL) = 13312 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0
Re: 100% cpu , epoll_wait()
On 16/04/16 01:53, Jim Freeman wrote: I'm suspecting that a connection to the stats port goes wonky with a '-sf' reload, but I'll have to wait for it to re-appear to poke further. I'll look first for a stats port connection handled by the pegged process, then use 'tcpkill' to kill just that connection (rather than the whole process, which may be handling other connections). We use haproxy 1.6.3 (latest CentOS 6.7) and experience similar situation after some reloads (-sf). The old haproxy process does not exit and uses 100% cpu, strace showing: epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 epoll_wait(0, {}, 200, 0) = 0 In our case, it was a tcp backend tunnelling rsyslog messages. After restarting local rsyslogd, the load was gone and old haproxy instance exited. It's hard to tell how many reloads it takes to make haproxy go crazy or what is the exact reproducible test. But it does not take hundreds of restart, rather 10-20 (our reloads are not very frequent) to make haproxy go crazy. $ haproxy -vv HA-Proxy version 1.6.3 2015/12/25 Copyright 2000-2015 Willy TarreauBuild options : TARGET = linux2628 CPU = generic CC = gcc CFLAGS = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement OPTIONS = USE_ZLIB=1 USE_OPENSSL=1 USE_PCRE=1 Default settings : maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Encrypted password support via crypt(3): yes Built with zlib version : 1.2.3 Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Built with OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 Running on OpenSSL version : OpenSSL 1.0.1e-fips 11 Feb 2013 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports prefer-server-ciphers : yes Built with PCRE version : 7.8 2008-09-05 PCRE library supports JIT : no (USE_PCRE_JIT not set) Built without Lua support Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Total: 3 (3 usable), will use epoll. Best regards, Veiko
Choosing backend based on constant
Hi everybody I'd like to simplify my haproxy configuration management by using almost identical configurations for different groups of haproxy installations that use different backends based on string comparision. The only difference in haproxy configuration files of different groups would be that string. The configuration logic would be something like this (not syntactically correct for haproxy, I know, but should show what I wish to accomplish): constant = foo # first hostgroup configuration constant = bar # second hostgroup configuration # common configuration for all hostgroups use_backend ha_backend_foo if constant == foo use_backend ha_backend_bar if constant == bar ... I wonder how to specify that string and form acl to use in 'use_backend' statement? Thanks in advance, Veiko
Re: Choosing backend based on constant
I'd like to manually add that constant string into configuration, not to get it from the traffic. It would help to reduce differences in haproxy configuration file between server groups and easier migration between groups. Best regards, Veiko On 30/04/15 18:06, Baptiste wrote: On Thu, Apr 30, 2015 at 11:49 AM, Veiko Kukk vk...@xvidservices.com wrote: Hi everybody I'd like to simplify my haproxy configuration management by using almost identical configurations for different groups of haproxy installations that use different backends based on string comparision. The only difference in haproxy configuration files of different groups would be that string. The configuration logic would be something like this (not syntactically correct for haproxy, I know, but should show what I wish to accomplish): constant = foo # first hostgroup configuration constant = bar # second hostgroup configuration # common configuration for all hostgroups use_backend ha_backend_foo if constant == foo use_backend ha_backend_bar if constant == bar ... I wonder how to specify that string and form acl to use in 'use_backend' statement? Thanks in advance, Veiko Hi Veiko, The question is how do you set your constant, what piece of information do you use from the traffic or whatever? Then we may help you. Baptiste