Hi,

I might have found a potentially critical bug in haproxy. It occurs when
haproxy is retrying to dispatch a request to a server. If haproxy fails
to dispatch a request to a server that is either up or has no health
checks enabled it dispatches the request to a random server on any
backend in any mode (tcp or http) as long as they are in the up state
(via tcp-connect or httpchk health checks). In addition haproxy logs the
correct server although it dispatches the request to a wrong server.

I could not reproduce this issue on 2.0.14 or any 2.1.x version. This
happens in tcp and http mode and http requests might be dispatched to
tcp servers and vice versa.

I have tried to narrow this problem down in source using git bisect,
which results in this commit marked as the first bad one:
7b69c91e7d9ac6d7513002ecd3b06c1ac3cb8297.


I have created a setup with a minimal config to reproduce this
unintended behavior with a high probability to occur. The odds of this
bug occuring can be increased by having more backend servers using
health checks. With 2 faulty servers without health checks and 20
servers with health checks I get about a 90-95% chance for a wrong dispatch.


reduced haproxy.cfg:
# note: replace 127.0.0.1 with the internal ip of the host running the
container, i.e. 172.17.0.1 when using
# docker or the container names when using a container-network
# make sure port 8999 is not available
defaults
  mode  http
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 5s
  timeout  client 1m
  timeout  server 1m

frontend fe_http_in
  bind 0.0.0.0:8100
  use_backend be_bad.example.com if { req.hdr(host) bad.example.com }
  use_backend be_good.example.com if { req.hdr(host) good.example.com }

backend be_bad.example.com
  server bad.example.com_8999 127.0.0.1:8999 # make sure this port is
not bound

backend be_good.example.com
  server good.example.com_8070 127.0.0.1:8070 check

listen li_bad.example.com_tcp_39100:
  bind 0.0.0.0:39100
  mode tcp
  server bad.example.com_tcp_8999 127.0.0.1:8999 # make sure this port
is not bound

listen li_good.example.com_tcp_39200:
  bind 0.0.0.0:39200
  mode tcp
  server good.example.com_tcp_8071 127.0.0.1:8071 check

running test-webservices:
podman run -d --rm -p 8070:80 --name nginxdemo nginxdemos/hello
podman run -d --rm -p 8071:8000 --name crccheckdemo crccheck/hello-world
# note: I am running to different webservices to highlight the random
aspect for the redispatch

run haproxy inside a container:
podman run -it --rm \
  --name haproxy \
  -v "${PWD}/haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:z" \
  -p 8100:8100 \
  -p 39100:39100 \
  -p 39200:39200 \
  haproxy:2.0.15-alpine
# note: I have selinux enabled and thus require :z or :Z to mount a file
or directory into the container


testing using curl:
# expected: HTTP/1.1 503 Service Unavailable
curl -sv -o /dev/null http://bad.example.com --connect-to
::127.0.0.1:8100 2>&1 | grep HTTP/1
# expected: nothing (curl writes "Empty reply from server")
curl -sv -o /dev/null http://127.0.0.1:39100 2>&1 | grep HTTP/1

# expected: HTTP/1.1 200 OK
curl -sv -o /dev/null http://good.example.com --connect-to
::127.0.0.1:8100 2>&1 | grep HTTP/1
# expected: HTTP/1.0 200 OK
curl -sv -o /dev/null http://127.0.0.1:39200 2>&1 | grep HTTP/1


In this setup the curls which get mismatched to the wrong backend server
flip between either HTTP/1.1 when dispatched to the nginxdemos/hello,
between HTTP/1.0 when dispatched to the crccheck/hello-world or the
correct response (503 or nothing) in consecutive runs.

I have attached a simple script which recreates this small test-setup
using podman but it could fairly easily be converted to docker.


cheers,
Michael

Attachment: create-setup.sh
Description: application/shellscript

Reply via email to