Re: proposing a haproxy 2.0.16 release (was [BUG] haproxy retries dispatch to wrong server)
Am 11.07.20 um 00:03 schrieb Tim Düsterhus: Lukas, Am 10.07.20 um 21:04 schrieb Lukas Tribus: I finally pushed this fix in the 2.0. Note the same bug affected the HTTP proxy mode (using http_proxy option). In this case, the connection retries is now disabled (on the 2.0 only) because the destination address is definitely lost. It was the easiest way to work around the bug without backporting a bunch of sensitive patches from the 2.1. Given the importance and impact of this bug you just fixed (at least 5 independent people already reported it on GH and ML) and the amount of other important fixes already in the tree (including at least one crash fix), I'm suggesting to release 2.0.16. Unless there are other important open bugs with ongoing troubleshooting? Agreed. With 2.2 being released this is a good opportunity to release updates for all branches down to 1.9 at least to properly mark the latter as EoL. I'd also appreciate at least backports happening without a release for 1.6 and 1.7, allowing us to close off old, fixed, issues on GitHub. Bumping this. Fully agree. I will try to emit the 2.0.16 this Friday. For the backports in older versions, it is probably a good idea too. We just need some time to do so. No promise on this point :) -- Christopher Faulet
Re: proposing a haproxy 2.0.16 release (was [BUG] haproxy retries dispatch to wrong server)
William, Christopher, Am 11.07.20 um 00:03 schrieb Tim Düsterhus: > Lukas, > > Am 10.07.20 um 21:04 schrieb Lukas Tribus: >>> I finally pushed this fix in the 2.0. Note the same bug affected the HTTP >>> proxy >>> mode (using http_proxy option). In this case, the connection retries is now >>> disabled (on the 2.0 only) because the destination address is definitely >>> lost. >>> It was the easiest way to work around the bug without backporting a bunch of >>> sensitive patches from the 2.1. >> >> Given the importance and impact of this bug you just fixed (at least 5 >> independent people already reported it on GH and ML) and the amount of >> other important fixes already in the tree (including at least one >> crash fix), I'm suggesting to release 2.0.16. Unless there are other >> important open bugs with ongoing troubleshooting? >> >> > > Agreed. With 2.2 being released this is a good opportunity to release > updates for all branches down to 1.9 at least to properly mark the > latter as EoL. > > I'd also appreciate at least backports happening without a release for > 1.6 and 1.7, allowing us to close off old, fixed, issues on GitHub. Bumping this. Best regards Tim Düsterhus
Re: proposing a haproxy 2.0.16 release (was [BUG] haproxy retries dispatch to wrong server)
Lukas, Am 10.07.20 um 21:04 schrieb Lukas Tribus: >> I finally pushed this fix in the 2.0. Note the same bug affected the HTTP >> proxy >> mode (using http_proxy option). In this case, the connection retries is now >> disabled (on the 2.0 only) because the destination address is definitely >> lost. >> It was the easiest way to work around the bug without backporting a bunch of >> sensitive patches from the 2.1. > > Given the importance and impact of this bug you just fixed (at least 5 > independent people already reported it on GH and ML) and the amount of > other important fixes already in the tree (including at least one > crash fix), I'm suggesting to release 2.0.16. Unless there are other > important open bugs with ongoing troubleshooting? > > Agreed. With 2.2 being released this is a good opportunity to release updates for all branches down to 1.9 at least to properly mark the latter as EoL. I'd also appreciate at least backports happening without a release for 1.6 and 1.7, allowing us to close off old, fixed, issues on GitHub. Best regards Tim Düsterhus
proposing a haproxy 2.0.16 release (was [BUG] haproxy retries dispatch to wrong server)
Hello, On Fri, 10 Jul 2020 at 08:08, Christopher Faulet wrote: > Hi, > > I finally pushed this fix in the 2.0. Note the same bug affected the HTTP > proxy > mode (using http_proxy option). In this case, the connection retries is now > disabled (on the 2.0 only) because the destination address is definitely lost. > It was the easiest way to work around the bug without backporting a bunch of > sensitive patches from the 2.1. Given the importance and impact of this bug you just fixed (at least 5 independent people already reported it on GH and ML) and the amount of other important fixes already in the tree (including at least one crash fix), I'm suggesting to release 2.0.16. Unless there are other important open bugs with ongoing troubleshooting? lukas@dev:~/haproxy-2.0$ git log --oneline v2.0.15.. d982a8e BUG/MEDIUM: stream-int: Disable connection retries on plain HTTP proxy mode e8d2423 BUG/MAJOR: stream: Mark the server address as unset on new outgoing connection b1e9407 MINOR: http: Add support for http 413 status c3db7c1 BUG/MINOR: backend: Remove CO_FL_SESS_IDLE if a client remains on the last server 0d881b2 BUG/MEDIUM: connection: Continue to recv data to a pipe when the FD is not ready 847271d MINOR: connection: move the CO_FL_WAIT_ROOM cleanup to the reader only 39bb227 BUG/MEDIUM: mux-h1: Subscribe rather than waking up in h1_rcv_buf() e0ca6ad BUG/MEDIUM: mux-h1: Disable splicing for the conn-stream if read0 is received 0528ae2 BUG/MINOR: mux-h1: Disable splicing only if input data was processed 8e8168a BUG/MINOR: mux-h1: Don't read data from a pipe if the mux is unable to receive afadc9a BUG/MINOR: mux-h1: Fix the splicing in TUNNEL mode 8e4e357 BUG/MINOR: http_act: don't check capture id in backend (2) c55e3e1 DOC: configuration: fix alphabetical ordering for tune.pool-{high,low}-fd-ratio a5e11c0 DOC: configuration: add missing index entries for tune.pool-{low,high}-fd-ratio ab06f88 BUG/MINOR: proxy: always initialize the trash in show servers state ca212e5 BUG/MINOR: proxy: fix dump_server_state()'s misuse of the trash 135899e BUG/MEDIUM: pattern: Add a trailing \0 to match strings only if possible 0b77c18 DOC: ssl: add "allow-0rtt" and "ciphersuites" in crt-list 4271c77 MINOR: cli: make "show sess" stop at the last known session 8ba978b BUG/MEDIUM: fetch: Fix hdr_ip misparsing IPv4 addresses due to missing NUL 9bd736c REGTEST: ssl: add some ssl_c_* sample fetches test 15080cb REGTEST: ssl: tests the ssl_f_* sample fetches d6cd2b3 MINOR: spoe: Don't systematically create new applets if processing rate is low 1b4cc2e BUG/MINOR: http_ana: clarify connection pointer check on L7 retry d995d5f BUG/MINOR: spoe: correction of setting bits for analyzer 26e1841 REGTEST: Add a simple script to tests errorfile directives in proxy sections 8645299 BUG/MINOR: systemd: Wait for network to be online b88a37c MEDIUM: map: make the "clear map" operation yield c5034a3 REGTEST: http-rules: test spaces in ACLs with master CLI 4cdff8b REGTEST: http-rules: test spaces in ACLs c3a2e35 BUG/MINOR: mworker/cli: fix semicolon escaping in master CLI da9a2d1 BUG/MINOR: mworker/cli: fix the escaping in the master CLI 7ed43aa BUG/MINOR: cli: allow space escaping on the CLI 249346d BUG/MINOR: spoe: add missing key length check before checking key names 1b7f58f BUG/MEDIUM: ebtree: use a byte-per-byte memcmp() to compare memory blocks 47a5600 BUG/MINOR: tcp-rules: tcp-response must check the buffer's fullness 9f3bda0 MINOR: http: Add 404 to http-request deny c09f797 MINOR: http: Add 410 to http-request deny lukas@dev:~/haproxy-2.0$ Thanks, Lukas
Re: [BUG] haproxy retries dispatch to wrong server
Le 07/07/2020 à 23:02, Christopher Faulet a écrit : Le 07/07/2020 à 15:16, Michael Wimmesberger a écrit : Hi, I might have found a potentially critical bug in haproxy. It occurs when haproxy is retrying to dispatch a request to a server. If haproxy fails to dispatch a request to a server that is either up or has no health checks enabled it dispatches the request to a random server on any backend in any mode (tcp or http) as long as they are in the up state (via tcp-connect or httpchk health checks). In addition haproxy logs the correct server although it dispatches the request to a wrong server. Hi Michael, Thanks for the reproducer and the detailed description. I'm able to reproduce the bug thanks to it. I attached a patch to address it. I will see with Willy tomorrow morning if it is the good way to fix it. But it should do the trick. Hi, I finally pushed this fix in the 2.0. Note the same bug affected the HTTP proxy mode (using http_proxy option). In this case, the connection retries is now disabled (on the 2.0 only) because the destination address is definitely lost. It was the easiest way to work around the bug without backporting a bunch of sensitive patches from the 2.1. Thanks again Michael. You're bug report was really helpful. -- Christopher Faulet
Re: [BUG] haproxy retries dispatch to wrong server
Le 07/07/2020 à 15:16, Michael Wimmesberger a écrit : Hi, I might have found a potentially critical bug in haproxy. It occurs when haproxy is retrying to dispatch a request to a server. If haproxy fails to dispatch a request to a server that is either up or has no health checks enabled it dispatches the request to a random server on any backend in any mode (tcp or http) as long as they are in the up state (via tcp-connect or httpchk health checks). In addition haproxy logs the correct server although it dispatches the request to a wrong server. Hi Michael, Thanks for the reproducer and the detailed description. I'm able to reproduce the bug thanks to it. I attached a patch to address it. I will see with Willy tomorrow morning if it is the good way to fix it. But it should do the trick. Thanks again ! -- Christopher Faulet >From bbc89e04a252b3719f221cd0afbad49f507c4c46 Mon Sep 17 00:00:00 2001 From: Christopher Faulet Date: Tue, 7 Jul 2020 22:25:19 +0200 Subject: [PATCH] BUG/MAJOR: stream: Mark the server address as unset on new outgoing connection In connect_server() function, when a new connection is created, it is important to mark the server address as explicitly unset, removing the SF_ADDR_SET stream's flag, to be sure to set it. On the first connect attempt, it is not a problem because the flag is not set. But on a connection failure, the faulty endpoint is detached. It is specific to the 2.0. See the commit 7b69c91e7 ("BUG/MAJOR: stream-int: always detach a faulty endpoint on connect failure") for details. As a result of this commit, on a connect retry, a new connection is created. But this time, the SF_ADDR_SET flag is set (except on a redispatch). But in reality, because it is a new connection, the server address is not set. On the other end, when a connection is released (or when it is created), the from/to addresses are not cleared. Thus because of the described bug, when a connection is get from the memory pool, the addresses of the previous connection is used. leading to undefined and random routing. For a totally new connection, no addresses are set and an internal error is reported by si_connect(). A reproducer with a detailed description was posted on the ML: https://www.mail-archive.com/haproxy@formilux.org/msg37850.html This patch should fix the issue #717. There is no direct mainline commit ID for this fix, and it must not be backported as it's solely for 2.0. --- src/backend.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/backend.c b/src/backend.c index d0d11779a..2d0fd6a43 100644 --- a/src/backend.c +++ b/src/backend.c @@ -1457,6 +1457,7 @@ int connect_server(struct stream *s) /* no reuse or failed to reuse the connection above, pick a new one */ if (!srv_conn) { + s->flags &= ~SF_ADDR_SET; srv_conn = conn_new(); if (srv_conn) srv_conn->target = s->target; -- 2.26.2
Re: [BUG] haproxy retries dispatch to wrong server
Hello Michael, On Tue, 7 Jul 2020 at 15:16, Michael Wimmesberger wrote: > > Hi, > > I might have found a potentially critical bug in haproxy. It occurs when > haproxy is retrying to dispatch a request to a server. If haproxy fails > to dispatch a request to a server that is either up or has no health > checks enabled it dispatches the request to a random server on any > backend in any mode (tcp or http) as long as they are in the up state > (via tcp-connect or httpchk health checks). In addition haproxy logs the > correct server although it dispatches the request to a wrong server. > > I could not reproduce this issue on 2.0.14 or any 2.1.x version. This > happens in tcp and http mode and http requests might be dispatched to > tcp servers and vice versa. > > I have tried to narrow this problem down in source using git bisect, > which results in this commit marked as the first bad one: > 7b69c91e7d9ac6d7513002ecd3b06c1ac3cb8297. Makes sense that 2.1 is not affected because this commit was specifically written for 2.0 (it's not a backport). Exceptionally detailed and thorough reporting here, this will help a lot, thank you. A bug has been previously filed, but the details mentioned in this thread will help get things going: https://github.com/haproxy/haproxy/issues/717 Lukas
[BUG] haproxy retries dispatch to wrong server
Hi, I might have found a potentially critical bug in haproxy. It occurs when haproxy is retrying to dispatch a request to a server. If haproxy fails to dispatch a request to a server that is either up or has no health checks enabled it dispatches the request to a random server on any backend in any mode (tcp or http) as long as they are in the up state (via tcp-connect or httpchk health checks). In addition haproxy logs the correct server although it dispatches the request to a wrong server. I could not reproduce this issue on 2.0.14 or any 2.1.x version. This happens in tcp and http mode and http requests might be dispatched to tcp servers and vice versa. I have tried to narrow this problem down in source using git bisect, which results in this commit marked as the first bad one: 7b69c91e7d9ac6d7513002ecd3b06c1ac3cb8297. I have created a setup with a minimal config to reproduce this unintended behavior with a high probability to occur. The odds of this bug occuring can be increased by having more backend servers using health checks. With 2 faulty servers without health checks and 20 servers with health checks I get about a 90-95% chance for a wrong dispatch. reduced haproxy.cfg: # note: replace 127.0.0.1 with the internal ip of the host running the container, i.e. 172.17.0.1 when using # docker or the container names when using a container-network # make sure port 8999 is not available defaults mode http timeout http-request 10s timeout queue 1m timeout connect 5s timeout client 1m timeout server 1m frontend fe_http_in bind 0.0.0.0:8100 use_backend be_bad.example.com if { req.hdr(host) bad.example.com } use_backend be_good.example.com if { req.hdr(host) good.example.com } backend be_bad.example.com server bad.example.com_8999 127.0.0.1:8999 # make sure this port is not bound backend be_good.example.com server good.example.com_8070 127.0.0.1:8070 check listen li_bad.example.com_tcp_39100: bind 0.0.0.0:39100 mode tcp server bad.example.com_tcp_8999 127.0.0.1:8999 # make sure this port is not bound listen li_good.example.com_tcp_39200: bind 0.0.0.0:39200 mode tcp server good.example.com_tcp_8071 127.0.0.1:8071 check running test-webservices: podman run -d --rm -p 8070:80 --name nginxdemo nginxdemos/hello podman run -d --rm -p 8071:8000 --name crccheckdemo crccheck/hello-world # note: I am running to different webservices to highlight the random aspect for the redispatch run haproxy inside a container: podman run -it --rm \ --name haproxy \ -v "${PWD}/haproxy/haproxy.cfg:/usr/local/etc/haproxy/haproxy.cfg:z" \ -p 8100:8100 \ -p 39100:39100 \ -p 39200:39200 \ haproxy:2.0.15-alpine # note: I have selinux enabled and thus require :z or :Z to mount a file or directory into the container testing using curl: # expected: HTTP/1.1 503 Service Unavailable curl -sv -o /dev/null http://bad.example.com --connect-to ::127.0.0.1:8100 2>&1 | grep HTTP/1 # expected: nothing (curl writes "Empty reply from server") curl -sv -o /dev/null http://127.0.0.1:39100 2>&1 | grep HTTP/1 # expected: HTTP/1.1 200 OK curl -sv -o /dev/null http://good.example.com --connect-to ::127.0.0.1:8100 2>&1 | grep HTTP/1 # expected: HTTP/1.0 200 OK curl -sv -o /dev/null http://127.0.0.1:39200 2>&1 | grep HTTP/1 In this setup the curls which get mismatched to the wrong backend server flip between either HTTP/1.1 when dispatched to the nginxdemos/hello, between HTTP/1.0 when dispatched to the crccheck/hello-world or the correct response (503 or nothing) in consecutive runs. I have attached a simple script which recreates this small test-setup using podman but it could fairly easily be converted to docker. cheers, Michael create-setup.sh Description: application/shellscript