Hi, HAProxy 2.6.10 was released on 2023/03/10. It added 78 new commits after version 2.6.9.
A bit more than half of the commits are HTTP3/QUIC fixes. However, as indicated in the 2.8-dev5 announce, a concurrency bug introduced in 2.5 was fixed in this version, that may cause freezes and crashes when some HTTP/1 backend connections are closed by the server exactly at the same time they're going to be reused by another thread. Another different bug also affecting idle connections since 2.2 was fixed, possibly causing an occasional crash. One possible work-around if you've faced such issues recently is to disable inter-thread connection reuse with this directive in the global section: tune.idle-pool.shared off But beware that this may increase the total number of connections kept established with your backend servers depending the reuse frequency and the number of threads. I want to be clear on one point: the issue is structural, and trying to port these fixes to 2.6 just made the situation much worse! The only solution we found to address it relies on some facilities that were integrated in 2.7 and that offer the guarantees we need during certain critical transitions of a file descriptor state (refcount and owning thread group). I have not found any workable solution to the problem without these facilities, so this required that I backported the strict minimum amount of patches (18) to bring these facilities there. I hate having to do that but this time there was no other option. And that made me realize that instead of keeping 2.6 on its own half-way architecture like it was, it's not a bad thing that it ressembles next versions to make backports of fixes more reliable in the future. Despite the large amount of tests in legacy and master-worker modes, with/without threads, with reloads, FD passing, saturated listeners etc, it remains possible that I failed on a corner case. So please watch a little bit more than usual after you update, and do not hesitate to report any issue you think you might face consecutive to this. Other, less critical, issues are described below. In master-worker mode, when performing an upgrade from an old version (before 1.9) to a newer version (>=2.5) the HAPROXY_PROCESSES environment variable was missing, and this combined with a missing element in an internal structure representing old processes will result in a null-deref which will crash the master process after the reload. It's very unlikely to hit this one, except during migration attempts where it can make one think the new version doesn't work, and encourage to roll back to the older one. The reported uptime for processes was also fixed so that wall clock time is used instead of the internal timer. A few issues affecting the Lua mapping of the HTTP client were addressed; one of them is a small memory leak by which a few bytes could leak per request, which could become problematic if used heavily. Another one is a concurrency issue with Lua's garbage collector that didn't sufficiently lock other threads' items while trying to free them. It was found that the low-latency scheduling of TLS handshakes can degenerate during extreme loads, and take a long time to recover. The problem is that in order to prevent TLS handshakes from causing high latency spikes to the rest of the traffic, they're placed in a dedicated scheduling class that executes one of them per polling loop. But if there are too many pending due to a big burst, the extra latency caused to the pending ones can make clients give up and try again, reaching the point where none of the processed tasks yields anything useful since they were already abandonned. Now the number of handshakes per loop will grow as the number of pending ones grows, and this addresses the problem without adding extra latency even under extreme loads. There were various QUIC fixes aiming at addressing some issues reported by users and tests. The cache failed to cache a response for a request that had the "no-cache" directive (typically a forced reload). This prevented from refreshing the cache this way, this is now fixed. In some rare cases it was possible to freeze a compressing stream if there was exactly one byte left at the end of the buffer, which was insufficient to place a new HTX block and prevented any progress from being made. This has been the case since 2.5 so it doesn't seem easy to trigger! Layer7 retries did not work anymore on the "empty-response" condition due to a change that was made in 2.4. The dump of the supported config language keywords with -dK incorrectly attributed some of the crt-list specific keywords to "bind ... ssl", which could cause confusion for those designing config parsers or generators by regularly checking for new stuff there. Now an explicit "crt-list" sub- section is dumped and "bind ssl" only dumps keywords really supported on "bind" lines. The global directive "no numa-cpu-mapping" that forces haproxy to bind to multiple CPU sockets even if it should result in lower performance was lost across reloads in master-worker mode, because the master in wait mode doesn't see it, thus applies the restriction to itself, and that one is inherited by subsequent masters that pass it to their workers. And a few other minor updates aside, that's about all. Those with high request rates or who already noticed crashes or strange errors are strongly encouraged to update and try again. Also one point regarding 2.5, it also requires the fixes mentioned above, but we need to keep in mind that it's about to reach end of life. Thus I prefer to delay a last version a little bit so as to encourage the last users of 2.5 to switch to 2.6 and still have a 2.5 fallback without the fixes above in the unlikely event something's wrong with them. We'll probably do a last one with these fixes by the end of the month. Please find the usual URLs below : Site index : https://www.haproxy.org/ Documentation : https://docs.haproxy.org/ Wiki : https://github.com/haproxy/wiki/wiki Discourse : https://discourse.haproxy.org/ Slack channel : https://slack.haproxy.org/ Issue tracker : https://github.com/haproxy/haproxy/issues Sources : https://www.haproxy.org/download/2.6/src/ Git repository : https://git.haproxy.org/git/haproxy-2.6.git/ Git Web browsing : https://git.haproxy.org/?p=haproxy-2.6.git Changelog : https://www.haproxy.org/download/2.6/src/CHANGELOG Dataplane API : https://github.com/haproxytech/dataplaneapi/releases/latest Pending bugs : https://www.haproxy.org/l/pending-bugs Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs Code reports : https://www.haproxy.org/l/code-reports Latest builds : https://www.haproxy.org/l/dev-packages Willy --- Complete changelog : Amaury Denoyelle (9): MINOR: h3/hq-interop: handle no data in decode_qcs() with FIN set BUG/MINOR: mux-quic: transfer FIN on empty STREAM frame MINOR: quic: adjust request reject when MUX is already freed BUG/MINOR: quic: also send RESET_STREAM if MUX released BUG/MINOR: quic: acknowledge STREAM frame even if MUX is released BUG/MINOR: h3: prevent hypothetical demux failure on int overflow BUG/MEDIUM: quic: properly handle duplicated STREAM frames BUG/MEDIUM: quic: do not crash when handling STREAM on released MUX BUG/MINOR: mux-quic: properly init STREAM frame as not duplicated Aurelien DARRAGON (2): BUG/MINOR: lua/httpclient: missing free in hlua_httpclient_send() BUG/MEDIUM: httpclient/lua: fix a race between lua GC and hlua_ctx_destroy Christopher Faulet (12): BUG/MEDIUM: stconn: Don't rearm the read expiration date if EOI was reached REGTESTS: Fix ssl_errors.vtc script to wait for connections close BUG/MEDIUM: h1-htx: Never copy more than the max data allowed during parsing DOC: config: Fix description of options about HTTP connection modes DOC: config: Add the missing tune.fail-alloc option from global listing DOC: config: Clarify the meaning of 'hold' in the 'resolvers' section BUG/MEDIUM: connection: Clear flags when a conn is removed from an idle list BUG/MINOR: http-check: Don't set HTX_SL_F_BODYLESS flag with a log-format body BUG/MINOR: http-check: Skip C-L header for empty body when it's not mandatory BUG/MINOR: http-ana: Don't increment conn_retries counter before the L7 retry BUG/MINOR: http-ana: Do a L7 retry on read error if there is no response BUG/MINOR: fd: Properly init the fd state in fd_insert() Frédéric Lécaille (16): BUILD: thead: Fix several 32 bits compilation issues with uint64_t variables BUG/MINOR: quic: Possible unexpected counter incrementation on send*() errors BUG/MINOR: quic: Really cancel the connection timer from qc_set_timer() BUG/MINOR: quic: Missing call to task_queue() in qc_idle_timer_do_rearm() BUG/MINOR: quic: Do not probe with too little Initial packets BUG/MINOR: quic: Wrong initialization for io_cb_wakeup boolean BUG/MINOR: quic: Do not drop too small datagrams with Initial packets BUG/MINOR: quic: Missing padding for short packets BUG/MINOR: quic: Do not send too small datagrams (with Initial packets) BUG/MINOR: quic: Ensure to be able to build datagrams to be retransmitted BUG/MINOR: quic: Remove force_ack for Initial,Handshake packets BUG/MINOR: quic: Ensure not to retransmit packets with no ack-eliciting frames BUG/MINOR: quic: Do not resend already acked frames MINOR: quic: Move code to wakeup the timer task to avoid anti-amplication deadlock BUG/MINOR: quic: Missing detections of amplification limit reached BUG/MINOR: quic: Missing listener accept queue tasklet wakeups Michael Prokop (1): DOC/CLEANUP: fix typos Remi Tricot-Le Breton (3): BUG/MINOR: cache: Cache response even if request has "no-cache" directive BUG/MINOR: cache: Check cache entry is complete in case of Vary BUG/MINOR: ssl: Use 'date' instead of 'now' in ocsp stapling callback William Lallemand (8): BUG/MINOR: mworker: stop doing strtok directly from the env BUG/MEDIUM: mworker: prevent inconsistent reload when upgrading from old versions BUG/MEDIUM: mworker: don't register mworker_accept_wrapper() when master FD is wrong MINOR: startup: HAPROXY_STARTUP_VERSION contains the version used to start BUG/MINOR: mworker: prevent incorrect values in uptime MINOR: ssl: rename confusing ssl_bind_kws BUG/MINOR: config: crt-list keywords mistaken for bind ssl keywords BUG/MINOR: mworker: use MASTER_MAXCONN as default maxconn value Willy Tarreau (27): MINOR: fd/cli: report the polling mask in "show fd" BUG/MINOR: sched: properly report long_rq when tasks remain in the queue BUG/MEDIUM: sched: allow a bit more TASK_HEAVY to be processed when needed MINOR: mux-h2/traces: do not log h2s pointer for dummy streams MINOR: mux-h2/traces: add a missing TRACE_LEAVE() in h2s_frt_handle_headers() BUG/MINOR: ring: do not realign ring contents on resize BUG/MINOR: init: properly detect NUMA bindings on large systems BUG/MEDIUM: master: force the thread count earlier BUG/MINOR: init: make sure to always limit the total number of threads BUG/MINOR: thread: report thread and group counts in the correct order BUG/MINOR: ring: release the backing store name on exit MEDIUM: epoll: don't synchronously delete migrated FDs MEDIUM: poller: program the update in fd_update_events() for a migrated FD MAJOR: fd: remove pending updates upon real close MINOR: fd: delete unused updates on close() MEDIUM: fd: add the tgid to the fd and pass it to fd_insert() MINOR: cli/fd: show fd's tgid and refcount in "show fd" MINOR: fd: add functions to manipulate the FD's tgid MINOR: fd: add fd_get_running() to atomically return the running mask MAJOR: fd: grab the tgid before manipulating running MINOR: fd: make fd_clr_running() return the previous value instead MEDIUM: fd: make fd_insert/fd_delete atomically update fd.tgid MEDIUM: fd: quit fd_update_events() when FD is closed MAJOR: poller: only touch/inspect the update_mask under tgid protection MEDIUM: fd: support broadcasting updates for foreign groups in updt_fd_polling BUG/MAJOR: fd/thread: fix race between updates and closing FD BUG/MAJOR: fd/threads: close a race on closing connections after takeover ---