Re: [ANNOUNCE] haproxy-2.6-dev4
Hi Tim, On Wed, Mar 30, 2022 at 09:14:42PM +0200, Tim Düsterhus wrote: > Willy, > > On 3/26/22 10:22, Willy Tarreau wrote: > > be the last LTS version with this. I'm interested in opinions and feedback > > about this. And the next question will obviously be "how could we detect > > Can you clarify what *exactly* is expected to be removed and what will > remain? Is it just SRV DNS records or more? What I believe is causing significant trouble at the moment in the DNS area is the assignment of randomly delivered IP addresses to a fleat of servers. Whether it's from SRV or just from a wide range of addresses returned for a single request, it's basically the same. For example if you configure 10 servers with the same name "foo.example.com", the DNS will have to check in each response if there are addresses already assigned to active servers, and just refresh them, then find if there are addresses that are not assigned and see if some addressless servers are available, in which case these addresses will be assigned to them, then spot any address that has disappeared for a while, and decide whether or not the servers that were assigned such addresses finally ought to be stopped. In addition to being totally unreliable, it's extremely CPU intensive. We've seen plenty of situations where the watchdog was triggered due to this, and in my opinion the concept is fundamentally flawed since responses are often partial. As soon as you suspect that all active addresses were not delivered, you know that you have to put lots of hacks in place. What I would like to see is a resolver that does just that: resolving. If multiple addresses are returned for a name, as long as one of them is already assigned that's OK otherwise the server's address changes. If you have multiple servers with the same name, it should be written clearly that it's not the resolver's role to try to distribute multiple responses fairly. Instead I'd rather see addresses assigned like they would at boot when using the libc's resolver, i.e. any address to any server, possibly the same address. This would definitely clarify that the resolver is there to respond to the question "give me the first [ipv4/ipv6/any] address corresponding to this name" and not be involved in backend-wide hacks. This would also make sure that do-resolve() does simple and reliable things. Also I would like to see the resolvers really resolve CNAMEs, because that's what application level code (e.g. Lua or HTTP client) really needs. If I understand right, at the moment CNAMEs are only resolved if they appear in the same response, thus I strongly doubt they can work cross-domain. It's important to keep in mind that the reasons such mechanisms were put in place originally was in order to adopt new emerging trends around Consul and similar registries. Nowadays all these ones have evolved to support way more reliable and richer APIs due to such previous limitations, and the DNS as we support it should really really really not be used. I hope this clarifies the situation and doesn't start to make anyone worry :-) Anyway there's no emergency, the code is still there, and my concern is more about how we can encourage such existing users to start to think about revisiting their approach with new tools and practices. And this will also require that we have working alternatives to suggest. While I'm pretty confident that the dataplane-api, ingress controller and such things already offer a valid response, I don't know for sure if they can be considered as drop-in replacement nor if they support everything, and this will have to be studied as well before starting to scare users! Cheers, Willy
Re: [ANNOUNCE] haproxy-2.6-dev4
Willy, On 3/26/22 10:22, Willy Tarreau wrote: be the last LTS version with this. I'm interested in opinions and feedback about this. And the next question will obviously be "how could we detect Can you clarify what *exactly* is expected to be removed and what will remain? Is it just SRV DNS records or more? Best regards Tim Düsterhus
Re: Haproxy rate limit monitoring
Istvan, On 3/24/22 08:47, Szabo, Istvan (Agoda) wrote: I’m using rate limiting with my haproxy18 and I’d like to somehow squeeze out metrics from it based on ip adresses who is close to the limit or how the users are hiting the limits. At the moment what I can do is a very silly solution and I don’t like, in a while loop I’m listening the socket and I might redirect to a file the output: while sleep 0.5;do printf 'show table https\nshow table http\n' |nc -U /var/lib/haproxy/stats done I’d like to know is there any other more elegant solution please? I've presented a solution for real-time monitoring of a stick table on HAProxyConf 2021: https://github.com/WoltLab/node-haproxy-peers https://www.haproxy.com/user-spotlight-series/using-haproxy-peers-for-real-time-quota-tracking/ Best regards Tim Düsterhus
Re: possible bug in haproxy: backend switching with map file does not work with HTTP/2
Jarno, On 3/30/22 14:57, Jarno Huuskonen wrote: Hello, when testing with HTTP/2 we found a behaviour, we did not expect: we use switching between different backends by use of a map file, e.g.: use_backend %[url,map_beg(/etc/haproxy/pool.map,defaultbackend)] With HTTP/1.1 this works fine in haproxy. But with HTTP/2, it does not work. I think with HTTP/2 %[url] is https://dom.ain/path... and with HTTP/1.1 %[url] is just path (I think this has been discussed on list, but at the moment I can't find a link). I can't find anything within half a minute either, but "Origin Form" is what's used for HTTP/2 URL I believe. Have you tried with %[path,map_beg(/etc/haproxy/pool.map,defaultbackend)] ? This is the correct solution (and in fact it's effectively documented): https://cbonte.github.io/haproxy-dconv/2.4/configuration.html#url With ACLs, using "path" is preferred over using "url", because clients may send a full URL as is normally done with proxies. The only real use is to match "*" which does not match in "path", and for which there is already a predefined ACL. Best regards Tim Düsterhus
Re: possible bug in haproxy: backend switching with map file does not work with HTTP/2
Hi, On Wed, 2022-03-30 at 12:19 +, Ralf Saier wrote: > Hello, > > when testing with HTTP/2 we found a behaviour, we did not expect: > > we use switching between different backends by use of a map file, e.g.: > use_backend %[url,map_beg(/etc/haproxy/pool.map,defaultbackend)] > > With HTTP/1.1 this works fine in haproxy. > But with HTTP/2, it does not work. > I think with HTTP/2 %[url] is https://dom.ain/path... and with HTTP/1.1 %[url] is just path (I think this has been discussed on list, but at the moment I can't find a link). Have you tried with %[path,map_beg(/etc/haproxy/pool.map,defaultbackend)] ? -Jarno > > Here‘s a minimal configuration file to reproduce this: > > > global > log /dev/log local0 warning > > # log /dev/log local0 > # log /dev/log local1 notice > > chroot /var/lib/haproxy > stats socket /run/haproxy/admin.sock mode 660 level admin expose- > fd listeners > stats timeout 30s > user haproxy > group haproxy > daemon > > # Default SSL material locations > ca-base /etc/ssl/certs > crt-base /etc/ssl/private > > # See: > https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate > ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA- > AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM- > SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA- > AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384 > ssl-default-bind-ciphersuites > TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256 > ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets > > defaults > log global > mode http > option httplog > # option dontlognull > timeout connect 5000 > timeout client 5 > timeout server 5 > > backend defaultbackend > log global > mode http > http-response set-header X-Info "defaultbackend : %s" > > server default_1 127.0.0.1:81 > > backend backend_2 > log global > mode http > http-response set-header X-Info "backend_2 : %s" > > server default_2 127.0.0.1:81 > > > backend backend_3 > log global > mode http > http-response set-header X-Info "backend_3 : %s" > > server default_3 127.0.0.1:81 > > > frontend ssl > log global > mode http > > option httplog > > bind *:443 alpn h2,http/1.1 ssl crt /etc/haproxy/x.pem > > acl is_path_3 path_beg /3 > use_backend backend_3 if is_path_3 > > use_backend %[url,map_beg(/etc/haproxy/pool.map,defaultbackend)] > default_backend defaultbackend > > > > Content of /etc/haproxy/pool.map is: > /2 backend_2 > > > > HAProxy Version: > haproxy -vvv > HAProxy version 2.5.5-1ppa1~focal 2022/03/14 -https://haproxy.org/ > Status: stable branch - will stop receiving fixes around Q1 2023. > Known bugs: http://www.haproxy.org/bugs/bugs-2.5.5.html > Running on: Linux 5.4.0-104-generic #118-Ubuntu SMP Wed Mar 2 19:02:41 UTC > 2022 x86_64 > Build options : > TARGET = linux-glibc > CPU = generic > CC = cc > CFLAGS = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-d3zlWl/haproxy- > 2.5.5=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate- > time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after- > statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno- > sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field- > initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value > -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference > OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 > USE_SYSTEMD=1 USE_PROMEX=1 > DEBUG = > > Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT > +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY > +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +ACCEPT4 - > CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS - > 51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP - > EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING > > Default settings : > bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 > > Built with multi-threading support (MAX_THREADS=64, default=1). > Built with OpenSSL version : OpenSSL 1.1.1f 31 Mar 2020 > Running on OpenSSL version : OpenSSL 1.1.1f 31 Mar 2020 > OpenSSL library supports TLS extensions : yes > OpenSSL library supports SNI : yes > OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3 > Built with Lua version : Lua 5.3.3 > Built with the Prometheus exporter as a service > Built with network na
possible bug in haproxy: backend switching with map file does not work with HTTP/2
Hello, when testing with HTTP/2 we found a behaviour, we did not expect: we use switching between different backends by use of a map file, e.g.: use_backend %[url,map_beg(/etc/haproxy/pool.map,defaultbackend)] With HTTP/1.1 this works fine in haproxy. But with HTTP/2, it does not work. Here's a minimal configuration file to reproduce this: global log /dev/log local0 warning # log /dev/loglocal0 # log /dev/loglocal1 notice chroot /var/lib/haproxy stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners stats timeout 30s user haproxy group haproxy daemon # Default SSL material locations ca-base /etc/ssl/certs crt-base /etc/ssl/private # See: https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384 ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256 ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets defaults log global modehttp option httplog # option dontlognull timeout connect 5000 timeout client 5 timeout server 5 backend defaultbackend log global modehttp http-response set-header X-Info "defaultbackend : %s" server default_1 127.0.0.1:81 backend backend_2 log global modehttp http-response set-header X-Info "backend_2 : %s" server default_2 127.0.0.1:81 backend backend_3 log global modehttp http-response set-header X-Info "backend_3 : %s" server default_3 127.0.0.1:81 frontend ssl logglobal mode http option httplog bind *:443 alpn h2,http/1.1 ssl crt /etc/haproxy/x.pem acl is_path_3 path_beg /3 use_backend backend_3 if is_path_3 use_backend %[url,map_beg(/etc/haproxy/pool.map,defaultbackend)] default_backend defaultbackend Content of /etc/haproxy/pool.map is: /2 backend_2 HAProxy Version: haproxy -vvv HAProxy version 2.5.5-1ppa1~focal 2022/03/14 - https://haproxy.org/ Status: stable branch - will stop receiving fixes around Q1 2023. Known bugs: http://www.haproxy.org/bugs/bugs-2.5.5.html Running on: Linux 5.4.0-104-generic #118-Ubuntu SMP Wed Mar 2 19:02:41 UTC 2022 x86_64 Build options : TARGET = linux-glibc CPU = generic CC = cc CFLAGS = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-d3zlWl/haproxy-2.5.5=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1 DEBUG = Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING Default settings : bufsize = 16384, maxrewrite = 1024, maxpollevents = 200 Built with multi-threading support (MAX_THREADS=64, default=1). Built with OpenSSL version : OpenSSL 1.1.1f 31 Mar 2020 Running on OpenSSL version : OpenSSL 1.1.1f 31 Mar 2020 OpenSSL library supports TLS extensions : yes OpenSSL library supports SNI : yes OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3 Built with Lua version : Lua 5.3.3 Built with the Prometheus exporter as a service Built with network namespace support. Built with libslz for stateless compression. Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip") Support for malloc_trim() is enabled. Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND Built with PCRE2 version : 10.34 2019-11-21 PCRE2 library supports JIT : yes Encrypted password support via crypt(3): yes Built with gcc compiler version 9.4.0 Available polling systems : epoll : pref=300, test result OK poll : pref=200, test result OK select : pref=150, test result OK Tot
Re: Check interval rise and fall behaviour
Le 3/29/22 à 18:02, Lais, Alexander a écrit : Dear all, We are using the backend health checks to disable flapping backends. The default values for rise and fall are 2 subsequent succeeded and 3 subsequent failed checks. Our check interval is at 1000ms (a little frequent, potentially part of the problem). Here is what we observed, using HAProxy 2.4.4: 1. Falling It started with the backend being up and then going down (fall). 2022-03-23T21:31:54.942ZHealth check for server http-routers-http1/node4 failed, reason: Layer4 timeout, check duration: 1000ms, status: 2/3 UP. 2022-03-23T21:31:56.920ZHealth check for server http-routers-http1/node4 failed, reason: Layer4 timeout, check duration: 1001ms, status: 1/3 UP. 2022-03-23T21:31:57.931ZHealth check for server http-routers-http1/node4 succeeded, reason: Layer7 check passed, code: 200, check duration: 1ms, status: 3/3 UP. 2022-03-24T10:03:27.223ZHealth check for server http-routers-http1/node4 failed, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 1ms, status: 2/3 UP. 2022-03-24T10:03:28.234ZHealth check for server http-routers-http1/node4 failed, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 1ms, status: 1/3 UP. 2022-03-24T10:03:29.237ZHealth check for server http-routers-http1/node4 failed, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 1ms, status: 0/2 DOWN. We go down from 3/3 to 2/3, 1/3 and back up again to 3/3. My assumption is that it then measured 2/3, but only needs 2 for rising, i.e. 2/2, which is bumped to 3/3 as the backend is now considered up. The backend stays up for a while and then goes down with my expected health checks, i.e. 3/3, 2/3, 1/3, 0/3 -> 0/2 (as we need 2 for rise). 2. Rising 2022-03-24T10:12:26.846ZHealth check for server http-routers-http1/node4 failed, reason: Layer4 timeout, check duration: 1000ms, status: 0/2 DOWN. 2022-03-24T10:12:29.843ZHealth check for server http-routers-http1/node4 failed, reason: Layer4 connection problem, info: "Connection refused", check duration: 1ms, status: 0/2 DOWN. 2022-03-24T10:13:43.902ZHealth check for server http-routers-http1/node4 failed, reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check duration: 2ms, status: 0/2 DOWN. 2022-03-24T10:14:03.039ZHealth check for server http-routers-http1/node4 succeeded, reason: Layer7 check passed, code: 200, check duration: 1ms, status: 1/2 DOWN. 2022-03-24T10:14:04.079ZHealth check for server http-routers-http1/node4 succeeded, reason: Layer7 check passed, code: 200, check duration: 1ms, status: 3/3 UP. So coming up (rise), it goes from 0/2 probes to 1/2 to 3/3. My assumption that it goes to 2/2, is considered up and is bumped to 3/3 because for fall we now need 3 failed probes. The documentation describes rise / fall as “number of subsequent probes that succeeded / failed. From my observations it looks like it is a sliding window of the last n being successful, i.e. when the number of fall is larger than rise, it is easier to rise back up with a single successful probe. Maybe I’m misreading the log outputs or drawing the wrong conclusions. If someone knows by heart how it’s supposed to work based on the code that would be great. Otherwise we can dig some more ourselves. Hi, Rise and fall values are the number of consecutive successful/unsuccessful health checks. When a server is DOWN, we count the number of consecutive successful health checks. If the counter reaches the rise value, the server is considered as UP. Otherwise, on each failure, the counter is reset. The same is done when the server is UP. we count the number of consecutive unsuccessful health checks. If the counter reaches the fall value, the server is considered as DOWN. Otherwise, on each success, the counter is reset. Internally it is a bit more complex but the idea is the same. In logs, the rise value is reported when the server is DOWN (X/rise) and the counter is incremented on each success (so from 0 to rise-1). And the fall value is reported when the server is UP (Y/fall) and the counter is decremented on each failure (from fall to 1). So when the server is set to DOWN state, you will never see "0/3 UP" in logs but "0/2 DOWN" instead. The same is true when the server is set to UP state, "2/2 UP" is never reported because "0/3 DOWN" is reported. And you're right, with a rise value lower than the fall value it is quicker to consider a DOWN server as UP than the opposite. But with a rise to 2, we need 2 successful health checks to set a server UP. -- Christopher Faulet