Hi,
HAProxy 3.0.6 was released on 2024/11/07. It added 92 new commits
after version 3.0.5.
As usual this releases fixed a number of bugs. But a significant part of
commits are dedicated to debugging. On last months, most of our time was
spent on bugs and it becomes more and more urgent to improve HAProxy
observability to be able to reduce time spent on bugs. While debugging stuff
was firstly focused on the 3.1, some commits were backported to the 3.0:
* The watchdog now emits warnings when it detects apparently locked up
threads. By default, a warning is emitted if a thread is blocked for
more than one second. But this may be configured thanks to the global
parameter "warn-blocked-traffic-after". The "debug dev loop" command was
also improved to be able to emit such warning when "warn" argument is
set.
* The dump of threads info on panic was improved. During a panic, each
thread now uses its own buffer instead of a global one to dump its
info. This way, all these buffers remain available in the core dump and
can be retrieved from gdb. This should help bug analysis.
* Memory profiling was also improved. Some entries were displayed with a
NULL return address, causing confusion. Now, undecodable stacks causing
an apparent NULL return address all lead to the "other" bin. In
addition, per-DSO stats are displayed before showing the total. It is
more convenient on systems where many libraries are loaded.
* A magic pattern was placed at the beginning of the post_mortem structure,
in order to ease finding it in core dumps. It now starts with the
32-chars pattern "POST-MORTEM STARTS HERE+7654321\0". The post_mortem
structure is now also placed in its own section, still to ease its
finding. Finally, several important pointers were added in it, such as
pointers on the pools list or on the proxies list.
* Non-printable characters are now removed from the "debug dev fd" cli
command output.
* Some GDB hints are added when crashing, for instance on a BUG_ON().
* The backtraces of all threads are now dumped, instead of only for the
stuck ones.
* The version and the command line are now added in the "show dev" cli
command output.
* Two new sample fetch functions were added to retrieve the internal error
name of the frontend (fc_err_name) or the backend (bc_err_name)
connections. In addition, connection error codes corresponding to common
errno were added, and they are now set when such errors are encountered
during recv/send/splice() calls.
* The current number of alive streams and the total number of streams
ever created are now tracked and reported in stats. This may be useful
to diagnose some bugs, like sessions leaks.
We really hope this will help us to speed-up the debugging process.
Now, the list of bugs fixed by this release:
* It was possible to truncate data with the HTTP compression filter
because of a bug in the filter API. When a filter may alter the message
payload, it is important to properly update the HTX message metadata to
not emit the wrong payload length. But this was not systematically
performed.
* In 2.4, it was decided to reject HTTP/1.1 protocol upgrade requests with
a payload because it is incompatible with the H2 on server side. Indeed,
such upgrade requests must be converted to CONNECT requests in H2. So no
payload are supported. However, it remains valid in HTTP/1.1. So instead
of rejecting it on client side, these requests are now accepted and
properly handled when sent to a H1 server. They are only rejected when
they are sent to a H2 server.
* No special care about H2C protocol upgrade were took. But this could be
a security issue if accepted by a server because it could be possible
for a client to bypass all filtering rules. To fix the issue, the
Upgrade header is removed from the requests if "h2c" or "h2" tokens are
found.
* The H1 multiplexer was only able to handle timeouts if the client or
server timeouts were defined, depending on the side. So, it was possible
to ignore client-fin/server-fin and http-keep-alive/http-request
timeouts.
* It was possible to have some blocked transfert in H2 because of an issue
with the zero-copy data forwarding. It was possible to never remove an
H2 stream from the send list.
* An issue with the zero-copy data forwarding of H1 requests waiting for a
TUNNEL established was fixed. SE_FL_EOI flag was erroneously set on the
client sedesc.
* On QUIC side, it was possible to experience some freezes with 0-RTT
connections; A leak was possible on post handshake frames on the error
path. Probing packets could be malformed; A stream could be erroneously
closed with an empty frame with FIN bit set instead of a RESET_STREAM
frame when not data was sent at all; The server timeout was never armed
for small requests, fully received when the stream is created; The
glitch counter was never reported at the session level, preventing any
tracking via a stick-table. All these bugs were fixed.
* A server abort was reported on an invalid HTTP response payload instead
of an internal error. And it was also possible to report a client abort
instead of a server abort during the HTTP response forwarding. The right
termination states are now reported in both cases.
* Immediate client abort on the CLI was not properly handled, blocking the
CLI applet with no timeout armed.
* It was possible to experience a deadlock by setting the maxconn of a
frontend on the CLI, because of a double lock on the proxy lock.
* "set ssl cert" CLI command was not properly checking the transaction
name. That could lead to commit accidentally a transaction on the wrong
certificate.
* It was possible to send more data than expected from the stats applet
via the zero-copy data forwarding. This was an issue for client
connections limited by a flow control, like in H2 and QUIC.
* There were some issues with early connection shutdowns that could lead to
truncated messages because some tests on blocked data were missing. In
addition, blocked data by an error on the sending path were not always
properly detected, leaving streams blocked without any timeout armed.
* Dequeuing process was refined to fix some bugs revealed by recent fixes
in this area.
* Inter-thread stream shutdown, used by "shutdown sessions server XXX" CLI
command or "on-error shutdown-sessions" server option, was not thread
safe.
* The dump of extra counters with the Prometheus exporter was buggy and
could lead to a buffer overflow because of a wrong increment on a stats
field index.
* It was possible to reuse HTTP connections for requests to different
endpoints because some address families where not properly handled. The
issue was encountered with the HTTP client and UNIX socket combination.
* A memory leak was possible if a failure is encountered when a dynamic
server is added with a check or agent-check options. In that case, the
server cannot be released because its refcount was incremented too
early. In addition access to the global server list during a dynamic
server deletion was not protected against concurrent accesses. In the
longterm, this could cause list corruption and crashes.
As a side note, it remains some unresolved issues on this release. One of
them is about some unexplained 502/SH or 502/SD responses. There are several
reports. It is not clear all of them are related to the same issue. And it
seems possible to also experience it on older versions. We are still trying
to understand why this happens. So, have a look to your logs to check if you
are affected or not. Any info can help to progress on this issue.
Thanks everyone for your help !
Please find the usual URLs below :
Site index : https://www.haproxy.org/
Documentation : https://docs.haproxy.org/
Wiki : https://github.com/haproxy/wiki/wiki
Discourse : https://discourse.haproxy.org/
Slack channel : https://slack.haproxy.org/
Issue tracker : https://github.com/haproxy/haproxy/issues
Sources : https://www.haproxy.org/download/3.0/src/
Git repository : https://git.haproxy.org/git/haproxy-3.0.git/
Git Web browsing : https://git.haproxy.org/?p=haproxy-3.0.git
Changelog : https://www.haproxy.org/download/3.0/src/CHANGELOG
Dataplane API :
https://github.com/haproxytech/dataplaneapi/releases/latest
Pending bugs : https://www.haproxy.org/l/pending-bugs
Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs
Code reports : https://www.haproxy.org/l/code-reports
Latest builds : https://www.haproxy.org/l/dev-packages
---
Complete changelog :
Amaury Denoyelle (7):
BUG/MINOR: h1: do not forward h2c upgrade header token
BUG/MINOR: h2: reject extended connect for h2c protocol
BUG/MINOR: mux-quic: report glitches to session
BUG/MEDIUM: mux-quic: ensure timeout server is active for short requests
BUG/MINOR: mux-quic: do not close STREAM with empty FIN if no data sent
BUG/MINOR: server: fix dynamic server leak with check on failed init
BUG/MEDIUM: server: fix race on servers_list during server deletion
Aurelien DARRAGON (7):
BUG/MEDIUM: server: server stuck in maintenance after FQDN change
BUG/MEDIUM: hlua: make hlua_ctx_renew() safe
BUG/MEDIUM: hlua: properly handle sample func errors in
hlua_run_sample_{fetch,conv}()
DOC: config: fix rfc7239 forwarded typo in desc
BUG/MEDIUM: connection/http-reuse: fix address collision on unhandled
address families
DOC: config: add missing glitch_{cnt,rate} data types
DOC: config: add missing glitch_{cnt,rate} sample definitions
Christopher Faulet (24):
MINOR: connection: No longer include stconn type header in connection-t.h
MINOR: mux-h1: Set EOI on SE during demux when both side are in DONE state
BUG/MEDIUM: mux-h1/mux-h2: Reject upgrades with payload on H2 side only
REGTESTS: h1/h2: Update script testing H1/H2 protocol upgrades
BUG/MEDIUM: cli: Be sure to catch immediate client abort
BUG/MINOR: mux-h1: Fix condition to set EOI on SE during zero-copy
forwarding
BUG/MINOR: http-ana: Disable fast-fwd for unfinished req waiting for
upgrade
BUG/MEDIUM: stconn: Wait iobuf is empty to shut SE down during a check
send
BUG/MINOR: http-ana: Don't report a server abort if response payload is
invalid
BUG/MEDIUM: stconn: Check FF data of SC to perform a shutdown in
sc_notify()
BUG/MAJOR: filters/htx: Add a flag to state the payload is altered by a
filter
REGTESTS: Never reuse server connection in http-messaging/truncated.vtc
BUG/MEDIUM: stats-html: Never dump more data than expected during 0-copy
FF
BUG/MEDIUM: mux-h2: Remove H2S from send list if data are sent via 0-copy
FF
BUG/MEDIUM: stconn: Report blocked send if sends are blocked by an error
BUG/MINOR: http-ana: Fix wrong client abort reports during responses
forwarding
BUG/MINOR: stconn: Don't disable 0-copy FF if EOS was reported on
consumer side
BUG/MEDIUM: mux-h1: Fix how timeouts are applied on H1 connections
BUG/MINOR: http-ana: Report internal error if an action yields on a final
eval
MINOR: stream: Save last evaluated rule on invalid yield
BUG/MEDIUM: promex: Fix dump of extra counters
MINOR: stream/stats: Expose the current number of streams in stats
MINOR: stream/stats: Expose the total number of streams ever created in
stats
BUG/MINOR: stats: Fix the name for the total number of streams created
Frederic Lecaille (4):
BUG/MINOR: quic: avoid leaking post handshake frames
BUG/MEDIUM: quic: avoid freezing 0RTT connections
BUG/MINOR: quic: fix malformed probing packet building
BUILD: Missing inclusion header for ssize_t type
Oliver Dala (1):
BUG/MEDIUM: cli: Deadlock when setting frontend maxconn
Valentine Krasnobaeva (3):
BUG/MINOR: cfgparse-global: fix allowed args number for setenv
BUG/MINOR: mworker: fix mworker-max-reloads parser
MINOR: cli/debug: show dev: add cmdline and version
William Lallemand (4):
BUG/MINOR: httpclient: return NULL when no proxy available during
httpclient_new()
MINOR: cli: remove non-printable characters from 'debug dev fd'
BUG/MINOR: trace: stop rewriting argv with -dt
BUG/MINOR: ssl/cli: 'set ssl cert' does not check the transaction name
correctly
Willy Tarreau (42):
REGTESTS: shorten a bit the delay for the h1/h2 upgrade test
BUG/MINOR: server: make sure the HMAINT state is part of MAINT
BUILD: tools: only include execinfo.h for the real backtrace() function
MINOR: tools: do not attempt to use backtrace() on linux without glibc
MINOR: task: define two new one-shot events for use with WOKEN_OTHER or
MSG
BUG/MEDIUM: stream: make stream_shutdown() async-safe
BUG/MINOR: queue: make sure that maintenance redispatches server queue
MINOR: server: make srv_shutdown_sessions() call pendconn_redistribute()
BUG/MEDIUM: queue: always dequeue the backend when redistributing the
last server
MINOR: debug: make mark_tainted() return the previous value
MINOR: chunk: drop the global thread_dump_buffer
MINOR: debug: split ha_thread_dump() in two parts
MINOR: debug: slightly change the thread_dump_pointer signification
MINOR: debug: make ha_thread_dump_done() take the pointer to be used
MINOR: debug: replace ha_thread_dump() with its two components
MEDIUM: debug: on panic, make the target thread automatically allocate
its buf
BUG/MEDIUM: queue: make sure never to queue when there's no more served
conns
MINOR: activity/memprofile: always return "other" bin on NULL return
address
MINOR: activity/memprofile: show per-DSO stats
BUILD: debug: silence a build warning with threads disabled
MINOR: pools: export the pools variable
MINOR: debug: place a magic pattern at the beginning of post_mortem
MINOR: debug: place the post_mortem struct in its own section.
MINOR: debug: store important pointers in post_mortem
DOC: config: document connection error 44 (reverse connect failure)
CLEANUP: connection: properly name the CO_ER_SSL_FATAL enum entry
MINOR: connection: add more connection error codes to cover common errno
MINOR: rawsock: set connection error codes when returning from
recv/send/splice
MINOR: connection: add new sample fetch functions fc_err_name and
bc_err_name
MINOR: debug: print gdb hints when crashing
MINOR: debug: do not limit backtraces to stuck threads
MINOR: debug: also add a pointer to struct global to post_mortem
MINOR: debug: also add fdtab and acitvity to struct post_mortem
MINOR: debug: remove the redundant process.thread_info array from
post_mortem
MINOR: wdt: move the local timers to a struct
MINOR: debug: add a function to dump a stuck thread
DEBUG: wdt: better detect apparently locked up threads and warn about them
DEBUG: cli: make it possible for "debug dev loop" to trigger warnings
DEBUG: wdt: make the blocked traffic warning delay configurable
DEBUG: wdt: add a stats counter "BlockedTrafficWarnings" in show info
BUILD: debug: also declare strlen() in __ABORT_NOW()
MINOR: debug: move the "recover now" warn message after the optional notes
--
Christopher Faulet