Hi, HAProxy 2.8.0 was released on 2023/05/31. It added 27 new commits after version 2.8-dev13.
Only a small minor issues were addressed this time, the rest was mostly doc polishing and cleanups. 2.8 is entering LTS status and will be supported till 2028-Q2, and 2.9-dev0 was just created to pursue the development, with an expected release around end of November this year. Let's try to summarize the changes from 37 participants in the 1382 commits that were merged since 2.7.0 from a high level perspective: - Lua/Mailers: there's now a full-Lua implementations of the mailers subsystem. It's provided as a Lua script (examples/lua/mailers.lua) which relies on the new internal event notification API. As such the script subscribes to server state change events and emits mails when the defined criteria are matched. It continues to rely on the "mailers" section, but being a Lua script, it's totally customizable. You can imagine to change the contents, change the notification conditions, send to multiple destinations etc. With this change, the internal Lua view of the servers was made fully dynamic so that added or removed servers are always seen in their current state. In fact the new event notification API goes way beyond this but better read the Lua API documentation to know more. The next step will be to completely deprecate the old Mailers subsystem in 2.9 and 3.0 and to remove it in 3.1. - HTTP/2 is advertised by default in ALPN on TLS listeners. It was about time, 5 years have passed since it was introduced, it's been enabled by default in clear text as an HTTP/1 upgrade for 4 years, yet some users do not know how to enable it. From now on, ALPN defaults to "h2,http/1.1" on TCP and "h3" on QUIC so that these protocol versions work by default. It's still possible to set/reset the ALPN to disable them of course. The old concern some users were having about window sizes was addressed by having a setting for each side (front vs back). - Threading: thread groups are now usable by default by "bind" lines without requiring to replicate these lines once per thread group. This means that by default a bind line is bound to all threads, regardless of the number of groups (up to 64 groups of 64 threads or 4096 threads total). As such it becomes possible to enable multiple groups on a large system to benefit from all the processing power available if you're running heavy rules, Lua, compression, SSL or whatever. We still default to a single NUMA node because the cases where it brings solid benefits are not frequent enough, compared to the cost of having more listening sockets. Note that on systems with non-uniform L3 caches like AMD EPYC, this can bring important performance gains with only one setting in the config. We noticed a doubling of the request rate on a 24-core EPYC 74F3 by enabling 8 groups instead of the default 1, to map to the L3 cache topology. The maximum tested so far was 224 threads with 4 & 8 groups on a dual-socket intel Sapphire Rapids system. That was blazingly fast :-) - SSL: there are quite a bunch of updates on the SSL front in this release: - it's possible to adjust the signature algorithms to improve interoperability with some other TLSv1.2/1.3 clients. These algorithms are used to sign the ephemeral keys used during the handshake. Changing these algorithms are useful for buggy clients that negociate algorithms they don't support. Though the usage is very specific. It's also possible to adjust this parameter for Client Authentication. - SSL hanshake failure logs now dump the OpenSSL error string by default. No need to configure an error-log-format anymore to show details on the handshake error. It can be helpful to debug SSL problems (e.g. you'll now see "tlsv1 alert unknown ca" instead of just "SSL handshake failure"). - OCSP: in 2.8 the OCSP responses for certificates can be automatically updated by a background task (by default every 5 minutes) so that it is no longer necessary to feed them over the CLI from an external script. Of course, this requires that your load balancers have outgoing HTTP access. This is enabled in crt-list files by adding "ocsp-update on" on the certificate's line. All this is observable on the CLI via "show ssl-ocsp-update" and "show ssl-ocsp-response". - LetsEncrypt: there's an acme.sh script in admin/acme.sh that can be used with your existing deployments (pull request for upstream still pending). It will permit to handle the renewal of LE certificates in stateless mode with no hassle (no need to proxy to a local port anymore). - OpenSSL: version 3.1 is now supported. It's less slow than 3.0 but still significantly slower than 1.1.1, but might be usable for most users with a low enough traffic. - wolfSSL: we've worked quite a bit with the wolfSSL team to make sure their latest version works well with HAProxy. As expected with such type of integration, there have been some rough edges at the beginning but we've now reached a point where their current release (5.6.0) works for simple setups, and their latest development branch (some PRs still under review) covers most of HAProxy's features. We're sufficiently confident in the fact that the last adjustments to be made will be in the lib (we're still working hand-in-hand with them to polish everything) and that the HAProxy side will not change for this. That's particularly important because it means that as new wolfSSL releases will appear in the next few weeks/months, stable HAProxy 2.8 releases will continue to work with it, or maybe even work better. From our testing, there are two nice aspects of this lib compared to OpenSSL: - it's fast and scales really well on multi-processor machines (2.5 times OpenSSL 3.1's performance on a 24-core machine) - it natively supports QUIC For these two reasons alone we do expect to encounter it increasingly frequently as users start to migrate from distros based on OpenSSL 1.1.1 to distros based on 3.0 with no option to rollback to 1.1.1 after they discover they need to multiply the number of LBs by 4 just to compensate for design flaws in a security library. - QUIC: it has been running almost flawlessly for a year on haproxy.org, and totally flawlessly over the last 6 months. We also owe @Tristan971 a huge kudos for deploying it live on significantly more traffic, and reporting countless issues. The internal architecture experienced the last few changes that we estimated were necessary, and we're confident that it's in a totally maintainable form now. Does it mean it's totally free of bugs ? Of course not, but in my opinion it reached the same level of stability as H2 had in 2.0 or 2.2, which is already pretty good. At this point we're only aware of a case which affects a small but non-negligible percentage of users' response time for Tristan, without being able to reproduce it out of his infrastructure. We're still on it of course, but despite this minor glitch we now consider it production-ready, which means that we're not seeing a good reason to stay away from it now if it brings benefits to your web site (e.g. visitors over lossy networks etc). For sure the SSL dependencies are still a constraint for the vast majority of those relying on OpenSSL, but with 3.0's performance ruined, even non-QUIC users have to rebuild anyway, so OpenSSL is no more a QUIC-only problem nowadays. What 2.8 brings to QUIC is a lot of stuff (mostly backported to 2.7), the support for reloads by default, and a global kill-switch to disable it entirely in case of doubt, issue or just to confirm whether or not an observed issue comes from it or not. - Stick-tables: the maximum number of parallel stick-counters used to be set at build time (default 3). Now it can be changed in the configuration using global.tune.stick-counters. - HTTP compression: now HTTP request body can be compressed. This is useful when you deal with many POSTs and your origin servers are on a different hosting area that makes your traffic pass over paid links! - HTTP "Forwarded" header field (RFC7239): this header that aims at replacing X-forwarded-for and friends is now supported, in input and output. It means we can complement it with certain parts (host, by, by_port, for, for_port). The benefit of using this one instead of the other is not always obvious, until you start to mix different products in your edge access and figure that they don't all add the same set of headers, and that for the application to figure which instance goes with which one, it's a nightmare. "Forwarded" conveys an ordered list of items so the ordering becomes as easy as it was when dealing with X-forwarded-for alone. - JWT now supports the RSA-PSS algorithm - There are a few reliability improvements: - Lua now has a burst-timeout setting which controls how long it can run a loop in non-yieldable context (e.g. converter function) and it will abort past this delay - binding errors faced during a reload could sometimes fail to resume on the old process (e.g. UNIX sockets). Now the mechanism was made more reliable, with the new process taking more care of old sockets until it manages to bind everything, and being able to roll them back entirely on error. - new metrics in show info to report the number of config warnings, the boot time and the number of times the global maxconn was reached. - the internal clock now wraps 20s after the boot, and not just every 49.7 days. This makes sure that developers have a better chance of facing clock-wrapping related bugs before they hit your production. And it worked, we found something like 8 of them, most likely all in fact. - the internal connection handling was revisited so that low-level errors are more accurately reported through the layers. There should be less case where some termination codes will be reported for a different condition when errors arrive together. - There were some performance improvements as well: - those mixing short and long connections might end up with unequal thread loads because incoming connections assigned to the least loaded thread could be off after short connections are gone and long ones are left on only some threads. A new queue load balancing algorithm "fair" resolves this by applying a round-robin to the threads. - rings used by traces are being used increasingly as a debugging aid by both users and developers. They're now much faster (2-3x). The support for the "trace" keyword in the global section is still marked experimental because some forthcoming changes are envisionned for 2.9 to almost completely remove the locking, and it may slightly affect the on-disk format for file-backed maps. - sometimes an old stopping process making heavy use of stick-tables could consume insane amounts of CPU almost entirely spent in the libc's malloc_trim() function (or in free/malloc due to locking contention). This was addressed and stick-table memory releasing on stopping will no happen in small, almost unnoticeable batches. - We know that users love troubleshooting tools (developers do as well), so here's some new stuff to play with: - "show quic" is to QUIC what "netstat" or "ss" are to TCP. It also supports a detailed format. - "show fd" can now filter on certain types (e.g. dump front sockets only, or UNIX sockets only) - H2 traces can at last show the received HTTP headers! - the CLI supports the process' uptime in the prompt. There's little use for this except for those who want to instantly spot when their LBs have rebooted (or failed to). - thread dumps in the panic output and "show activity" are now unlimited in length. That was becoming critical with buffers filling around 60 threads... - crashes when facing a bogus condition ("BUG_ON") will now produce an "illegal instruction" instead of "segmentation fault" on architectures supporting this (i386, x86_64, arm64 for now). This will improve the ability to diagnose what happened and the quality of bug reports. - There were a few updates to the configuration (cpu-map now supports commas, http-after-response supports more actions, sc-add-gpc() to increment a GPC by a fixed value, ability to ignore case when fetching a request parameter, httpclient supports disabling resolvers, enabled() preprocessor macro to enable config blocks only when features are supported) - There are also a few unlikely but possibly breaking changes: - option httpclose in the frontend no longers triggers a close in the backend and conversely. - fixed typo in "show info" ("TotalSplicdedBytesOut" is now properly spelled "TotalSplicdedBytesOut"). Only affects the CLI, not Prometheus. - ALPN as mentioned above is now presented by default in HTTP to enable HTTP/2 over TCP+SSL and HTTP/3 over QUIC. - For packagers, the build system is more flexible now with every single build option supporting its own CFLAGS and LDFLAGS (e.g. convenient when trying to force to use a static version of a lib). And as usual, this summary doesn't do justice to all those having worked hard on invisible things to make all this possible, nor those who spend a lot of time helping users who report issues and ask for help, and those who take the time to report cleanly documented issues as well! Thanks to them for their efforts! Please find the usual URLs below : Site index : https://www.haproxy.org/ Documentation : https://docs.haproxy.org/ Wiki : https://github.com/haproxy/wiki/wiki Discourse : https://discourse.haproxy.org/ Slack channel : https://slack.haproxy.org/ Issue tracker : https://github.com/haproxy/haproxy/issues Sources : https://www.haproxy.org/download/2.8/src/ Git repository : https://git.haproxy.org/git/haproxy-2.8.git/ Git Web browsing : https://git.haproxy.org/?p=haproxy-2.8.git Changelog : https://www.haproxy.org/download/2.8/src/CHANGELOG Dataplane API : https://github.com/haproxytech/dataplaneapi/releases/latest Pending bugs : https://www.haproxy.org/l/pending-bugs Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs Code reports : https://www.haproxy.org/l/code-reports Latest builds : https://www.haproxy.org/l/dev-packages Willy --- Complete changelog since 2.8-dev13: Amaury Denoyelle (7): CLEANUP: mux-quic: remove unneeded fields in qcc MINOR: mux-quic: remove nb_streams from qcc MINOR: quic: fix stats naming for flow control BLOCKED frames BUG/MEDIUM: mux-quic: only set EOI on FIN DOC: quic: remove experimental status for QUIC CLEANUP: mux-quic: rename functions for mux_ops CLEANUP: mux-quic: rename internal functions Aurelien DARRAGON (2): BUILD: init: print rlim_cur as regular integer DOC: config: fix rfc7239 converter examples Christopher Faulet (2): MINOR: compression: Improve the way Vary header is added DOC: config: Fix bind/server/peer documentation in the peers section Frédéric Lécaille (1): MINOR: quic: Add QUIC connection statistical counters values to "show quic" Patrick Hemmer (1): MINOR: init: pre-allocate kernel data structures on init William Lallemand (2): DOC: install: add details about WolfSSL DOC: install: specify the minimum openssl version recommended Willy Tarreau (10): BUILD: makefile: search for SSL_INC/wolfssl before SSL_INC BUG/MEDIUM: threads: fix a tiny race in thread_isolate() BUG/MINOR: mux-h2: refresh the idle_timer when the mux is empty BUILD: Makefile: use -pthread not -lpthread when threads are enabled CLEANUP: doc: remove 21 totally obsolete docs DOC: install: mention the common strict-aliasing warning on older compilers DOC: install: clarify a few points on the wolfSSL build method EXAMPLES: update the basic-config-edge file for 2.8 MINOR: quic/cli: clarify the "show quic" help message MINOR: version: mention that it's LTS now. eaglegai (2): BUG/MINOR: ssl_sock: add check for ha_meth BUG/MINOR: thread: add a check for pthread_create ---