Hi, I might have found a potentially critical bug in haproxy. It occurs when haproxy is retrying to dispatch a request to a server. If haproxy fails to dispatch a request to a server that is either up or has no health checks enabled it dispatches the request to a random server on any backend in any mode (tcp or http) as long as they are in the up state (via tcp-connect or httpchk health checks). In addition haproxy logs the correct server although it dispatches the request to a wrong server.
I could not reproduce this issue on 2.0.14 or any 2.1.x version. This happens in tcp and http mode and http requests might be dispatched to tcp servers and vice versa. I have tried to narrow this problem down in source using git bisect, which results in this commit marked as the first bad one: 7b69c91e7d9ac6d7513002ecd3b06c1ac3cb8297. I have created a setup with a minimal config to reproduce this unintended behavior with a high probability to occur. The odds of this bug occuring can be increased by having more backend servers using health checks. With 2 faulty servers without health checks and 20 servers with health checks I get about a 90-95% chance for a wrong dispatch. reduced haproxy.cfg: # note: replace 127.0.0.1 with the internal ip of the host running the container, i.e. 172.17.0.1 when using # docker or the container names when using a container-network # make sure port 8999 is not available defaults mode http timeout http-request 10s timeout queue 1m timeout connect 5s timeout client 1m timeout server 1m frontend fe_http_in bind 0.0.0.0:8100 use_backend be_bad.example.com if { req.hdr(host) bad.example.com } use_backend be_good.example.com if { req.hdr(host) good.example.com } backend be_bad.example.com server bad.example.com_8999 127.0.0.1:8999 # make sure this port is not bound backend be_good.example.com server good.example.com_8070 127.0.0.1:8070 check listen li_bad.example.com_tcp_39100: bind 0.0.0.0:39100 mode tcp server bad.example.com_tcp_8999 127.0.0.1:8999 # make sure this port is not bound listen li_good.example.com_tcp_39200: bind 0.0.0.0:39200 mode tcp server good.example.com_tcp_8071 127.0.0.1:8071 check running test-webservices: podman run -d --rm -p 8070:80 --name nginxdemo nginxdemos/hello podman run -d --rm -p 8071:8000 --name crccheckdemo crccheck/hello-world # note: I am running to different webservices to highlight the random aspect for the redispatch run haproxy inside a container: podman run -it --rm \ --name haproxy \ -v "${PWD}/haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:z" \ -p 8100:8100 \ -p 39100:39100 \ -p 39200:39200 \ haproxy:2.0.15-alpine # note: I have selinux enabled and thus require :z or :Z to mount a file or directory into the container testing using curl: # expected: HTTP/1.1 503 Service Unavailable curl -sv -o /dev/null http://bad.example.com --connect-to ::127.0.0.1:8100 2>&1 | grep HTTP/1 # expected: nothing (curl writes "Empty reply from server") curl -sv -o /dev/null http://127.0.0.1:39100 2>&1 | grep HTTP/1 # expected: HTTP/1.1 200 OK curl -sv -o /dev/null http://good.example.com --connect-to ::127.0.0.1:8100 2>&1 | grep HTTP/1 # expected: HTTP/1.0 200 OK curl -sv -o /dev/null http://127.0.0.1:39200 2>&1 | grep HTTP/1 In this setup the curls which get mismatched to the wrong backend server flip between either HTTP/1.1 when dispatched to the nginxdemos/hello, between HTTP/1.0 when dispatched to the crccheck/hello-world or the correct response (503 or nothing) in consecutive runs. I have attached a simple script which recreates this small test-setup using podman but it could fairly easily be converted to docker. cheers, Michael
create-setup.sh
Description: application/shellscript