Hi,
HAProxy 2.4.8 was released on 2021/11/03. It added 61 new commits
after version 2.4.7.
After almost one month, this is a bug fix release which addresses quite a
number of issues that were reported since 2.4.7:
- resolvers: there were a large number of structural issues in the
code, and quite frankly we're not proud of the solutions but it's
impossible to do something more elegant in the current state without
a major rewrite. So what matters here is that all race conditions
are addressed and that the code works reliably. While the 2.5 fixes
add a lookup tree to perform significant CPU savings on SRV records,
that code was not backported to 2.4 because it adds further changes
that do not seem necessary in the current situation. We may revisit
that choice later if users still face important CPU usage. But I'm
now more confident that the observed CPU loops were in fact infinite
loops due to the list bugs, rather than high CPU usage. In the current
situation everything was done so that the resolvers code couldn't
crash anymore (19 patches). I sincerely hope that we will not have
to experience another journey in that swamp for a while.
- an interesting bug in the ring API caused boundary checks for the
wrapping at the end of the buffer to be shifted by one both in the
producer and the consumer, thus they both cancel each other and are
not observable... until the byte after the buffer is not mapped or
belongs to another area. One crash was met on boot (since startup
messages are duplicated into a ring for later retrieval), and it is
possible that those sending logs over TCP might have faced it as
well, otherwise it's extremely unlikely to be observed outside of
these use cases.
- the CPU affinity setting on FreeBSD was relying on a wrong macro to
get the number of CPU, assuming it was always one, so the affinity
settings were rejected for the second and higher CPUs.
- using the tarpit could lead to head-of-line blocking of an H2
connection as the pending data were not drained. And in other
protocols, the presence of these pending data could cause a wakeup
loop between the mux and the stream, which usually ended in the
process being detected as faulty and being killed by the safety
checks.
- a similar wakeup loop could also happen when waiting for more data
(e.g. option http-buffer-request) with lots of data already present
in the receive buffer while the lower layer could only deliver a full
block at once, that couldn't fit.
- a slow memory leak in 2.4 with Lua on non-glibc systems was addressed.
Glibc's realloc() function exactly matches Lua's allocator semantics,
thus the allocator was simplified in 2.4... except that the man page
is not very clear on the fact that it's a glibc-ism to free on
realloc(0), leading to a slow leak in other environments.
- the h2spec tests in the CI were regularly failing on a few tests
expecting HTTP/2 GOAWAY frames that were sent (even seen in strace).
The problem was that we didn't perform a graceful shutdown and that
this copes badly with bidirectional communications as unread pending
data cause the connection to be reset and the frame to be lost. This
was addressed by performing a clean shutdown. It's unlikely that
anyone ever noticed this given that this essentially happens when
communication errors are reported (i.e. when the client has little
reason to complain).
- some users complained that TLS handshakes were renewed too often in
some cases. Emeric found that with the migration to the muxes in
1.9-2.0 we've lost the clean shutdown on end of connection that's
also used to commit the TLS session cache entry. For HTTP/2 this was
addressed as a side effect of the fix above, and for HTTP/1, a fix
was produced to also perform a clean shutdown on keep-alive
connections (it used to work fine only for close ones).
- the validity checks for sample fetch functions were only applied to
the frontend capability of a proxy. This means that using a small
set of sample fetch functions (like "be_name()") in proxies that are
both a frontend and a backend ("listen" or "defaults") would lead to
a config error while it is technically valid. This problem has always
been there and never reported.
- automatic cast of variables to other types would fail to first verify
if a cast method was known, possibly causing a crash at runtime when
calling them for the first time (e.g. using a variable of type address
as an argument to strcmp() or a boolean with secure_memcmp()).
- some streams could sometimes be frozen when filters were enabled (such
as compression) and an error was raised with data still left to be
processed.
- HTTP health check could report L7 timeout when facing a parse error,
because the response is dropped before being translated to HTX, while
the check waiting for a response didn't explicitly check for a possible
end-of-input.
- http-after-response rules must stop after an "allow" action, to match
their http-response counter-part.
- the parsing of the "Authorization" header field would fail if more
than one space was present between the scheme and the value.
- the "fix_tag_value()" fetch function wouldn't properly wait for more
data due to an inverted condition.
- build failures could happen on Mac/arm64 with a recent clang compiler.
A few tiny improvements:
- halog updates to report headers and query strings were backported, as
these are the type of improvements expected where halog is used (i.e.
in field).
- the memory profiler now also takes accounts of the bookkeeping size
used by each allocated area so that any future leak like the Lua one
will not be able to stay unnoticed anymore.
What is pleasant here is to see that very few of the issues above concern
2.4 only. In other words, 2.4 is at least as good as older versions. This
is very encouraging because it will allow us to push a bit less hard to
risk the backport of complex fixes too far: for some rare issues we might
sometimes prefer to encourage users to use a more recent version rather
than risk to break many other usages. [Yes, I'm looking at you, resolver
patches, that will hardly apply to 2.0 while it's not there that users
complain the most].
Updates for other versions are on their way as well. It just takes time.
Please find the usual URLs below :
Site index : http://www.haproxy.org/
Discourse : http://discourse.haproxy.org/
Slack channel : https://slack.haproxy.org/
Issue tracker : https://github.com/haproxy/haproxy/issues
Wiki : https://github.com/haproxy/wiki/wiki
Sources : http://www.haproxy.org/download/2.4/src/
Git repository : http://git.haproxy.org/git/haproxy-2.4.git/
Git Web browsing : http://git.haproxy.org/?p=haproxy-2.4.git
Changelog : http://www.haproxy.org/download/2.4/src/CHANGELOG
Cyril's HTML doc : http://cbonte.github.io/haproxy-dconv/
Willy
---
Complete changelog :
Amaury Denoyelle (3):
BUG/MEDIUM: cpuset: fix cpuset size for FreeBSD
BUILD: fix compilation on NetBSD
BUG/MINOR: backend: fix improper insert in avail tree for always reuse
Christopher Faulet (14):
BUG/MEDIUM: mux_h2: Handle others remaining read0 cases on partial frames
BUG/MINOR: http-ana: Don't eval front after-response rules if stopped on
back
BUG/MINOR: sample: Fix 'fix_tag_value' sample when waiting for more data
BUG/MEDIUM: stream: Keep FLT_END analyzers if a stream detects a channel
error
BUG/MEDIUM: tcpcheck: Properly catch early HTTP parsing errors
BUG/MINOR: mux-h1: Save shutdown mode if the shutdown is delayed
BUG/MEDIUM: mux-h1: Perform a connection shutdown when the h1c is released
BUG/MEDIUM: resolvers: Don't recursively perform requester unlink
BUG/MEDIUM: resolvers: Track api calls with a counter to free resolutions
BUG/MEDIUM: http-ana: Drain request data waiting the tarpit timeout
expiration
BUG/MEDIUM: stream-int: Block reads if channel cannot receive more data
BUG/MEDIUM: sample: Cumulate frontend and backend sample validity flags
DOC: config: Fix alphabetical order of fc_* samples
MINOR: stream: Improve dump of bogus streams
David CARLIER (1):
BUILD: atomic: fix build on mac/arm64
David Carlier (1):
BUILD/MINOR: cpuset freebsd build fix
Emeric Brun (2):
BUG/MAJOR: dns: tcp session can remain attached to a list after a free
BUG/MAJOR: dns: attempt to lock globaly for msg waiter list instead of
use barrier
John Roesler (1):
DOC/peers: some grammar fixes for peers 2.1 spec
Olivier Houchard (1):
MINOR: initcall: Rename __GLOBL and __GLOBL1.
Remi Tricot-Le Breton (1):
BUG/MINOR: http: Authorization value can have multiple spaces after the
scheme
Thayne McCombs (1):
DOC: configuration: add clarification on escaping in keyword arguments
Tim Duesterhus (6):
MINOR: halog: Add -qry parameter allowing to preserve the query string in
-uX
DOC: halog: Move the `-qry` parameter into the correct section in help
text
MINOR: halog: Rename -qry to -query
CLEANUP: halog: Use consistent indentation in help()
BUG/MINOR: halog: Add missing newlines in die() messages
MINOR: halog: Add support for extracting captures using -hdr
William Lallemand (1):
Revert "CLEANUP: server: always include the storage for SSL settings"
Willy Tarreau (29):
CLEANUP: server: always include the storage for SSL settings
CLEANUP: sample: rename sample_conv_var2smp() to *_sint
CLEANUP: sample: uninline sample_conv_var2smp_str()
MINOR: sample: provide a generic var-to-sample conversion function
BUG/MEDIUM: sample: properly verify that variables cast to sample
MINOR: resolvers: fix the resolv_str_to_dn_label() API about trailing zero
BUG/MEDIUM: resolver: make sure to always use the correct hostname length
BUG/MINOR: resolvers: do not reject host names of length 255 in SRV
records
MINOR: resolvers: fix the resolv_dn_label_to_str() API about trailing zero
BUG/MEDIUM: resolvers: fix truncated TLD consecutive to the API fix
BUG/MEDIUM: resolvers: use correct storage for the target address
MINOR: resolvers: merge address and target into a union "data"
BUG/MAJOR: resolvers: add other missing references during resolution
removal
BUILD: resolvers: avoid a possible warning on null-deref
BUG/MEDIUM: resolvers: always check a valid item in query_list
BUG/MAJOR: buf: fix varint API post- vs pre- increment
BUG/MINOR: task: do not set TASK_F_USR1 for no reason
BUG/MINOR: mux-h2: do not prevent from sending a final GOAWAY frame
BUG/MEDIUM: lua: fix memory leaks with realloc() on non-glibc systems
MINOR: memprof: report the delta between alloc and free on realloc()
MINOR: memprof: add one pointer size to the size of allocations
CLEANUP: resolvers: do not export resolv_purge_resolution_answer_records()
CLEANUP: always initialize the answer_list
CLEANUP: resolvers: simplify resolv_link_resolution() regarding requesters
CLEANUP: resolvers: replace all LIST_DELETE with LIST_DEL_INIT
MEDIUM: resolvers: use a kill list to preserve the list consistency
MEDIUM: resolvers: remove the last occurrences of the "safe" argument
BUG/MINOR: sample: fix backend direction flags consecutive to last fix
SCRIPTS: git-show-backports: re-enable file-based filtering
---