Re: [ANNOUNCE] haproxy-2.6-dev4

2022-03-30 Thread Willy Tarreau
Hi Tim,

On Wed, Mar 30, 2022 at 09:14:42PM +0200, Tim Düsterhus wrote:
> Willy,
> 
> On 3/26/22 10:22, Willy Tarreau wrote:
> > be the last LTS version with this. I'm interested in opinions and feedback
> > about this. And the next question will obviously be "how could we detect
> 
> Can you clarify what *exactly* is expected to be removed and what will
> remain? Is it just SRV DNS records or more?

What I believe is causing significant trouble at the moment in the DNS
area is the assignment of randomly delivered IP addresses to a fleat of
servers. Whether it's from SRV or just from a wide range of addresses
returned for a single request, it's basically the same. For example if
you configure 10 servers with the same name "foo.example.com", the DNS
will have to check in each response if there are addresses already
assigned to active servers, and just refresh them, then find if there
are addresses that are not assigned and see if some addressless servers
are available, in which case these addresses will be assigned to them,
then spot any address that has disappeared for a while, and decide
whether or not the servers that were assigned such addresses finally
ought to be stopped. In addition to being totally unreliable, it's
extremely CPU intensive. We've seen plenty of situations where the
watchdog was triggered due to this, and in my opinion the concept is
fundamentally flawed since responses are often partial. As soon as you
suspect that all active addresses were not delivered, you know that you
have to put lots of hacks in place.

What I would like to see is a resolver that does just that: resolving.

If multiple addresses are returned for a name, as long as one of them
is already assigned that's OK otherwise the server's address changes.
If you have multiple servers with the same name, it should be written
clearly that it's not the resolver's role to try to distribute multiple
responses fairly. Instead I'd rather see addresses assigned like they
would at boot when using the libc's resolver, i.e. any address to any
server, possibly the same address. This would definitely clarify that
the resolver is there to respond to the question "give me the first
[ipv4/ipv6/any] address corresponding to this name" and not be involved
in backend-wide hacks. This would also make sure that do-resolve() does
simple and reliable things. Also I would like to see the resolvers really
resolve CNAMEs, because that's what application level code (e.g. Lua or
HTTP client) really needs. If I understand right, at the moment CNAMEs
are only resolved if they appear in the same response, thus I strongly
doubt they can work cross-domain.

It's important to keep in mind that the reasons such mechanisms were put
in place originally was in order to adopt new emerging trends around
Consul and similar registries. Nowadays all these ones have evolved to
support way more reliable and richer APIs due to such previous limitations,
and the DNS as we support it should really really really not be used.

I hope this clarifies the situation and doesn't start to make anyone
worry :-)  Anyway there's no emergency, the code is still there, and
my concern is more about how we can encourage such existing users to
start to think about revisiting their approach with new tools and
practices. And this will also require that we have working alternatives
to suggest. While I'm pretty confident that the dataplane-api, ingress
controller and such things already offer a valid response, I don't know
for sure if they can be considered as drop-in replacement nor if they
support everything, and this will have to be studied as well before
starting to scare users!

Cheers,
Willy



Re: [ANNOUNCE] haproxy-2.6-dev4

2022-03-30 Thread Tim Düsterhus

Willy,

On 3/26/22 10:22, Willy Tarreau wrote:

be the last LTS version with this. I'm interested in opinions and feedback
about this. And the next question will obviously be "how could we detect


Can you clarify what *exactly* is expected to be removed and what will 
remain? Is it just SRV DNS records or more?


Best regards
Tim Düsterhus



Re: Haproxy rate limit monitoring

2022-03-30 Thread Tim Düsterhus

Istvan,

On 3/24/22 08:47, Szabo, Istvan (Agoda) wrote:

I’m using rate limiting with my haproxy18 and I’d like to somehow squeeze out 
metrics from it based on ip adresses who is close to the limit or how the users 
are hiting the limits.

At the moment what I can do is a very silly solution and I don’t like, in a 
while loop I’m listening the socket and I might redirect to a file the output:

while sleep 0.5;do
 printf 'show table https\nshow table http\n' |nc -U 
/var/lib/haproxy/stats
done


I’d like to know is there any other more elegant solution please?



I've presented a solution for real-time monitoring of a stick table on 
HAProxyConf 2021:


https://github.com/WoltLab/node-haproxy-peers
https://www.haproxy.com/user-spotlight-series/using-haproxy-peers-for-real-time-quota-tracking/

Best regards
Tim Düsterhus



Re: possible bug in haproxy: backend switching with map file does not work with HTTP/2

2022-03-30 Thread Tim Düsterhus

Jarno,

On 3/30/22 14:57, Jarno Huuskonen wrote:

Hello,
  
when testing with HTTP/2 we found a behaviour, we did not expect:
  
we use switching between different backends by use of a map file, e.g.:

use_backend %[url,map_beg(/etc/haproxy/pool.map,defaultbackend)]
  
With HTTP/1.1 this works fine in haproxy.

But with HTTP/2, it does not work.



I think with HTTP/2 %[url] is
https://dom.ain/path...
and with HTTP/1.1 %[url] is just path (I think this has been discussed on
list, but at the moment I can't find a link).


I can't find anything within half a minute either, but "Origin Form" is 
what's used for HTTP/2 URL I believe.



Have you tried with %[path,map_beg(/etc/haproxy/pool.map,defaultbackend)] ?


This is the correct solution (and in fact it's effectively documented):

https://cbonte.github.io/haproxy-dconv/2.4/configuration.html#url


With ACLs, using
"path" is preferred over using "url", because clients may send a full URL as
is normally done with proxies. The only real use is to match "*" which does
not match in "path", and for which there is already a predefined ACL.


Best regards
Tim Düsterhus



Re: possible bug in haproxy: backend switching with map file does not work with HTTP/2

2022-03-30 Thread Jarno Huuskonen
Hi,

On Wed, 2022-03-30 at 12:19 +, Ralf Saier wrote:
> Hello,
>  
> when testing with HTTP/2 we found a behaviour, we did not expect:
>  
> we use switching between different backends by use of a map file, e.g.:
> use_backend %[url,map_beg(/etc/haproxy/pool.map,defaultbackend)]
>  
> With HTTP/1.1 this works fine in haproxy.
> But with HTTP/2, it does not work.
> 

I think with HTTP/2 %[url] is
https://dom.ain/path...
and with HTTP/1.1 %[url] is just path (I think this has been discussed on
list, but at the moment I can't find a link).

Have you tried with %[path,map_beg(/etc/haproxy/pool.map,defaultbackend)] ?

-Jarno

>  
> Here‘s a minimal configuration file to reproduce this:
>  
> 
> global
>     log /dev/log local0 warning
>  
> #   log /dev/log    local0
> #   log /dev/log    local1 notice
>  
>     chroot /var/lib/haproxy
>     stats socket /run/haproxy/admin.sock mode 660 level admin expose-
> fd listeners
>     stats timeout 30s
>     user haproxy
>     group haproxy
>     daemon
>  
>     # Default SSL material locations
>     ca-base /etc/ssl/certs
>     crt-base /etc/ssl/private
>  
>     # See:
> https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
>     ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-
> AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-
> SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-
> AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
>     ssl-default-bind-ciphersuites
> TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
>     ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets
>  
> defaults
>     log global
>     mode    http
>     option  httplog
> #   option  dontlognull
>     timeout connect 5000
>     timeout client  5
>     timeout server  5
>  
> backend defaultbackend
>     log global
>     mode    http
>     http-response set-header X-Info "defaultbackend : %s"
>  
>     server default_1 127.0.0.1:81
>  
> backend backend_2
>     log global
>     mode    http
>     http-response set-header X-Info "backend_2 : %s"
>  
>     server default_2 127.0.0.1:81
>  
>  
> backend backend_3
>     log global
>     mode    http
>     http-response set-header X-Info "backend_3 : %s"
>  
>     server default_3 127.0.0.1:81
>  
>  
> frontend ssl
>     log    global
>     mode   http
>  
>     option  httplog
>  
>     bind *:443 alpn h2,http/1.1 ssl crt /etc/haproxy/x.pem
>  
>     acl is_path_3 path_beg /3
>     use_backend backend_3 if is_path_3
>  
>     use_backend %[url,map_beg(/etc/haproxy/pool.map,defaultbackend)]
>     default_backend  defaultbackend
>  
> 
>  
> Content of /etc/haproxy/pool.map is:
> /2  backend_2
>  
> 
>  
> HAProxy Version:
> haproxy -vvv
> HAProxy version 2.5.5-1ppa1~focal 2022/03/14 -https://haproxy.org/
> Status: stable branch - will stop receiving fixes around Q1 2023.
> Known bugs: http://www.haproxy.org/bugs/bugs-2.5.5.html
> Running on: Linux 5.4.0-104-generic #118-Ubuntu SMP Wed Mar 2 19:02:41 UTC
> 2022 x86_64
> Build options :
>   TARGET  = linux-glibc
>   CPU = generic
>   CC  = cc
>   CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-d3zlWl/haproxy-
> 2.5.5=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-
> time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-
> statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-
> sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-
> initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value
> -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
>   OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1
> USE_SYSTEMD=1 USE_PROMEX=1
>   DEBUG   =
>  
> Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT
> +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY
> +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -
> CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -
> 51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -
> EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING
>  
> Default settings :
>   bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
>  
> Built with multi-threading support (MAX_THREADS=64, default=1).
> Built with OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
> Running on OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
> OpenSSL library supports TLS extensions : yes
> OpenSSL library supports SNI : yes
> OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
> Built with Lua version : Lua 5.3.3
> Built with the Prometheus exporter as a service
> Built with network na

possible bug in haproxy: backend switching with map file does not work with HTTP/2

2022-03-30 Thread Ralf Saier
Hello,

when testing with HTTP/2 we found a behaviour, we did not expect:

we use switching between different backends by use of a map file, e.g.:
use_backend %[url,map_beg(/etc/haproxy/pool.map,defaultbackend)]

With HTTP/1.1 this works fine in haproxy.
But with HTTP/2, it does not work.

Here's a minimal configuration file to reproduce this:


global
log /dev/log local0 warning

#   log /dev/loglocal0
#   log /dev/loglocal1 notice

chroot /var/lib/haproxy
stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd 
listeners
stats timeout 30s
user haproxy
group haproxy
daemon

# Default SSL material locations
ca-base /etc/ssl/certs
crt-base /etc/ssl/private

# See: 
https://ssl-config.mozilla.org/#server=haproxy&server-version=2.0.3&config=intermediate
ssl-default-bind-ciphers 
ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
ssl-default-bind-ciphersuites 
TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
ssl-default-bind-options ssl-min-ver TLSv1.2 no-tls-tickets

defaults
log global
modehttp
option  httplog
#   option  dontlognull
timeout connect 5000
timeout client  5
timeout server  5

backend defaultbackend
log global
modehttp
http-response set-header X-Info "defaultbackend : %s"

server default_1 127.0.0.1:81

backend backend_2
log global
modehttp
http-response set-header X-Info "backend_2 : %s"

server default_2 127.0.0.1:81


backend backend_3
log global
modehttp
http-response set-header X-Info "backend_3 : %s"

server default_3 127.0.0.1:81


frontend ssl
logglobal
mode   http

option  httplog

bind *:443 alpn h2,http/1.1 ssl crt /etc/haproxy/x.pem

acl is_path_3 path_beg /3
use_backend backend_3 if is_path_3

use_backend %[url,map_beg(/etc/haproxy/pool.map,defaultbackend)]
default_backend  defaultbackend



Content of /etc/haproxy/pool.map is:
/2  backend_2



HAProxy Version:
haproxy -vvv
HAProxy version 2.5.5-1ppa1~focal 2022/03/14 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2023.
Known bugs: http://www.haproxy.org/bugs/bugs-2.5.5.html
Running on: Linux 5.4.0-104-generic #118-Ubuntu SMP Wed Mar 2 19:02:41 UTC 2022 
x86_64
Build options :
  TARGET  = linux-glibc
  CPU = generic
  CC  = cc
  CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-d3zlWl/haproxy-2.5.5=. 
-fstack-protector-strong -Wformat -Werror=format-security -Wdate-time 
-D_FORTIFY_SOURCE=2 -Wall -Wextra -Wundef -Wdeclaration-after-statement -fwrapv 
-Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare 
-Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers 
-Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 
-Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 
USE_SYSTEMD=1 USE_PROMEX=1
  DEBUG   =

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT 
+POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY 
+LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM 
-ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL 
+SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC 
+PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
Running on OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), 
raw-deflate("deflate"), gzip("gzip")
Support for malloc_trim() is enabled.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT 
IP_FREEBIND
Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 9.4.0

Available polling systems :
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Tot

Re: Check interval rise and fall behaviour

2022-03-30 Thread Christopher Faulet

Le 3/29/22 à 18:02, Lais, Alexander a écrit :

Dear all,

We are using the backend health checks to disable flapping backends.

The default values for rise and fall are 2 subsequent succeeded and 3 
subsequent failed checks.

Our check interval is at 1000ms (a little frequent, potentially part of the 
problem).

Here is what we observed, using HAProxy 2.4.4:

1. Falling

It started with the backend being up and then going down (fall).


2022-03-23T21:31:54.942ZHealth check for server 
http-routers-http1/node4 failed, reason: Layer4 timeout, check duration: 
1000ms, status: 2/3 UP.
2022-03-23T21:31:56.920ZHealth check for server 
http-routers-http1/node4 failed, reason: Layer4 timeout, check duration: 
1001ms, status: 1/3 UP.
2022-03-23T21:31:57.931ZHealth check for server 
http-routers-http1/node4 succeeded, reason: Layer7 check passed, code: 200, 
check duration: 1ms, status: 3/3 UP.
2022-03-24T10:03:27.223ZHealth check for server http-routers-http1/node4 failed, 
reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check 
duration: 1ms, status: 2/3 UP.
2022-03-24T10:03:28.234ZHealth check for server http-routers-http1/node4 failed, 
reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check 
duration: 1ms, status: 1/3 UP.
2022-03-24T10:03:29.237ZHealth check for server http-routers-http1/node4 failed, 
reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check 
duration: 1ms, status: 0/2 DOWN.


We go down from 3/3 to 2/3, 1/3 and back up again to 3/3. My assumption is that 
it then measured 2/3, but only needs 2 for rising, i.e. 2/2, which is bumped to 
3/3 as the backend is now considered up.

The backend stays up for a while and then goes down with my expected health 
checks, i.e. 3/3, 2/3, 1/3, 0/3 -> 0/2 (as we need 2 for rise).

2. Rising


2022-03-24T10:12:26.846ZHealth check for server 
http-routers-http1/node4 failed, reason: Layer4 timeout, check duration: 
1000ms, status: 0/2 DOWN.
2022-03-24T10:12:29.843ZHealth check for server http-routers-http1/node4 failed, 
reason: Layer4 connection problem, info: "Connection refused", check duration: 
1ms, status: 0/2 DOWN.
2022-03-24T10:13:43.902ZHealth check for server http-routers-http1/node4 failed, 
reason: Layer7 wrong status, code: 503, info: "Service Unavailable", check 
duration: 2ms, status: 0/2 DOWN.
2022-03-24T10:14:03.039ZHealth check for server 
http-routers-http1/node4 succeeded, reason: Layer7 check passed, code: 200, 
check duration: 1ms, status: 1/2 DOWN.
2022-03-24T10:14:04.079ZHealth check for server 
http-routers-http1/node4 succeeded, reason: Layer7 check passed, code: 200, 
check duration: 1ms, status: 3/3 UP.


So coming up (rise), it goes from 0/2 probes to 1/2 to 3/3. My assumption that 
it goes to 2/2, is considered up and is bumped to 3/3 because for fall we now 
need 3 failed probes.


The documentation describes rise / fall as “number of subsequent probes that 
succeeded / failed.
 From my observations it looks like it is a sliding window of the last n being 
successful, i.e. when the number of fall is larger than rise, it is easier to 
rise back up with a single successful probe.

Maybe I’m misreading the log outputs or drawing the wrong conclusions.

If someone knows by heart how it’s supposed to work based on the code that 
would be great. Otherwise we can dig some more ourselves.



Hi,

Rise and fall values are the number of consecutive successful/unsuccessful 
health checks. When a server is DOWN, we count the number of consecutive 
successful health checks. If the counter reaches the rise value, the server is 
considered as UP. Otherwise, on each failure, the counter is reset. The same is 
done when the server is UP. we count the number of consecutive unsuccessful 
health checks. If the counter reaches the fall value, the server is considered 
as DOWN. Otherwise, on each success, the counter is reset.


Internally it is a bit more complex but the idea is the same.

In logs, the rise value is reported when the server is DOWN (X/rise) and the 
counter is incremented on each success (so from 0 to rise-1). And the fall value 
is reported when the server is UP (Y/fall) and the counter is decremented on 
each failure (from fall to 1). So when the server is set to DOWN state, you will 
never see "0/3 UP" in logs but "0/2 DOWN" instead. The same is true when the 
server is set to UP state, "2/2 UP" is never reported because "0/3 DOWN" is 
reported.


And you're right, with a rise value lower than the fall value it is quicker to 
consider a DOWN server as UP than the opposite. But with a rise to 2, we need 2 
successful health checks to set a server UP.


--
Christopher Faulet