Hi, HAProxy 3.0.0 was released on 2024/05/29. It added 21 new commits after version 3.0-dev13. I do appreciate that everything was only cosmetic.
We're having a total of 1108 patches in this release among which 850 ones not concerning a bug, which makes it the smallest LTS release of all times (2.6 and 2.4 still remain the largest ones, respectively 65% and 58% larger). This is a good news in terms of expected stability, which might possibly break the old myth of "better avoid dot zero". Let's try to summarize what's new in this release. It has been one of the most difficult for me to summarize because I'm not seeing one big killer feature, instead it's an LTS as we like them: mostly a nice polishing of existing stuff and small improvements all over the place as permitted by the previous version's architectural changes. I tried to classify this into a few categories, depending on the intended benefits. First, let's enumerate the new features, and improvements of existing ones: - stats can finally be preserved across reloads for frontends, listeners, backends and servers. When using this, the config objects of the new process are preloaded with the relevant values from a dump of the previous process. This essentially concerns counters, ages and rates. Please have a look at "stats-file" and "dump stats-file" for more information. - the log outgoing load-balancing now relies on a regular backend, meaning that the load balancing algorithms could finally be unified with the ones used by other protocols, and servers now support weights. - log-format now supports JSON and CBOR output encoding. In such a case, the field name is taken from a new naming scheme that is placed within the log-format itself, allowing to assign a name to each field. - the load balancing algorithm "sticky" that was initially reserved for logs was generalized to other protocols. - the HTTP/2 RST_STREAM reason code can finally be forwarded to the server for client aborts. This addresses the problem a few users were facing with gRPC where request cancellation appeared as communication errors the server side. For now this is purposely limited to only a few reason codes that are relevant to gRPC so that we don't ruin the possibility to later extend that to H3 and maybe H1. - QUIC now supports the HyStart++ (RFC9406) alternative to slowstart with the Cubic algorithm. It's supposed to show better recovery patterns. It's not yet enabled by default. - a new set of converters, map_*_key, will report the matching part of the key itself instead of the associated pattern. The main target use cases for this is to know what address mask an address did match, or what regex a pattern did match. - the "uuid()" sample fetch function, which takes an optional version in argument now also supports "7" for UUIDv7. These UUIDs regroup many properties found in ULID and other mechanisms, one of the most interesting one being time-based locality that, for example, eases the archiving of old data, or the grouping of events on systems where they'll be processed together. - the name associated with servers in connection pools can now be overridden by the expression in "pool-conn-name" when SNI is not desired (useful with rhttp without SSL for example, but may also make sense when reaching remote servers over SSL tunnels). It also allows to entirely drop SSL from the server. - the "namespace" argument now works for "bind" and "server" lines using UNIX sockets. - Linux capabilities: the use of namespaces on the server side used to require capability "cap_sys_admin" but it was neither checked nor reported on startup to it would silently fail. The capability is now supported and is being checked for. Similarly, the need for capabilities for transparent proxying or QUIC are checked and reported on startup. Finally, file-system capabilities set on the executable are also supported now. - the set-mark/set-tos actions were extended to support an expression in addition of the constant, and were extended to also support the backend side. This can for example be used to select an outgoing link from a single IP address. The new backend actions are called "set-bc-mark" and "set-bc-tos", and by analogy new frontend actions called "set-fc-mark" and "set-fc-tos" were created, and the old actions are aliases of these last ones. - QUIC built with latest AWS-LC TLS library now correctly supports 0-RTT. - a new global setting "ssl-security-level" allows to adjust OpenSSL's internal security level beween 0 and 5. Previously it could only be done in openssl.cnf. - the key used by consistent hash to map to a server used to always be the server's id (either explicit or implicit, position-based), but that was not always convenient when dealing with fast added-removed server within a large fleet of LBs. Now the "hash-key" directive will also allow to use the server's address or address+port for this so that the same key ends up on the same server for all LBs. - The HTTP client now has an option to use either origin or absolute URIs. This should make it easier to configure it to talk to old servers which are not spec-compliant and do not support absolute URIs. The ocsp_update agent already exploits this ability via a new setting "ocsp-update.httpproxy". - it is now possible to suppress Content-Length and Transfer-Encoding headers from HTTP/1 requests and responses. It must never be done of course but there are rare situations where users dealing with bogus clients or server need to perform such cleanups. Most of the time when done, this will mark a connection non-reusable and it will be closed at the end of the transfer. - the proxy protocol now also parses TLV for LOCAL mode and supports sending them without a stream so that elements can be passed during the preconnect phase of a reverse-HTTP instance to a next stage that will no longer ignore them. - the new sched_setaffinity() of FreeBSD 14 and newer is now supported. - the new certificate selection callback for WolfSSL was now enabled since it's finally available in the upstream project. Second, there were a reasonable set of usability improvements, all the small features that make config management and day-to-day operations easier: - maps are often used to operate at run time on some parts of the configuration. When no initial value is desired, it was still needed to have an empty file (/dev/null is not usable since a map is indexed by its name). As such, some users have expressed their desire to have virtual and/or optional maps. Both are brought by this version. When a map is loaded from a file whose name begins with "opt@", the file will only be loaded if it exists otherwise an empty map will be created with this name. And maps whose name begins with "virt@" are exclusively virtual and never backed by a file. They're always created empty at boot, for use at run time. - the default certificate selection method was improved: till now, the default certificate was the first one mentioned on the bind line. This causes issues with sites that want to support both RSA and ECDSA. A new approach was brought, with an optional "default-crt" keyword that designates the default certs on the bind line, and its equivalent in the crt-list files designated by "*" in the name. This allows the right cert to be picked based on the desired algorithm. Of course the default behavior doesn't change. - the list of status codes that increment the http_err_cnt and http_fail_cnt counters can now be changed with the global directives "http-err-codes" and "http-fail-codes". This has long been requested, both by those whose applications randomly return 500 that are not server failures, and those where 404 happen a lot and does not necessarily indicate a URL scanner. All of the 1xx-5xx range is permitted for both classes. - cookies, both static and dynamic, are now permitted for dynamically added servers. - API clients will find the CLI more friendly when it comes to removing a server. First, idle connections are now automatically closed when trying to delete a server, so that it's no longer needed to wait for them to vanish. Second, a new "wait" command pauses operations for at most as long as specified, optionally waiting for a condition. A new such condition is "srv-removable", which checks when a server may safely be removed. This means that issuing this "wait" command before a "del server" command will save the client from having to periodically retry the operation. - a new "crt-store" configuration section is supported. It allows to declare certificates by specifying the path for each element. The aim is essentially to decorellate the storage from the instantiation, both of which are currently correlated in crt-lists, and to allow easier specification of individual components. This section supports "crt-base" and "key-base" to ease the splitting of certificates and keys into distinct directories, as well as "ocsp-update" to indicate which certificates need to have their OCSP partperiodically updated. The certificates also support aliases so that they can be referenced from a bind line with a more convenient names than a file name. crt-lists may now make use of these certificates to only decide which ones to instantiate for a given listener, without having to deal with deployment concerns such as paths and file names. - the "thread-hard-limit" global parameter was added. It allows to only set a hard limit on the number of threads without enforcing that value as the thread count (like nbthread does). This is convenient to prepare portable configs with no more than X threads when one knows it's only a waste of resources to use more. - certain warnings about the presence of HTTP rules in TCP frontends that are going to be upgraded to HTTP when switching to a backend will now no longer be reported when it is certain that they will work as expected. - a new "guid" keyword was added for servers, listeners and proxies. The purpose will be to make it possible for external APIs to assign a globally unique object identifier to each of them in stats dumps or CLI accesses, and to later reliably recognize a server upon reloads. One usage example right now is stats preservation across reloads where this GUID uniquely identifies a server between two configs. - it has become easier to pass extra CFLAGS / LDFLAGS to the Makefile, just pass them into these variables (and a few other ones). Many were removed as the result of the simplification. The removed ones will trigger a build warning indicating what to use instead. A warning will also be emitted when passing an unknown USE_* setting, and such settings now support to be set to zero to disable them. In addition to this, some changes aim at improving the reliability: - the draining of HTTP/1 request body was finally implemented. It is needed when an early response is sent before the end of a POST request, typically due to a redirect or authentication issue. It used to cause difficulties due to the TCP stack emitting an RST that would sometimes destroy the response before it had a chance to be sent, but this is now something of the past. - the buffer allocator's behavior on out-of-memory condition was finally fixed. It had been flaky since version 1.7, with possibilities for all requesters to deadlock if none had enough room to complete their work. A new, more robust algorithm was finally implemented, making sure that at least one requester has enough resources to make forward progress and let the system recover by itself. Other ones put a particular focus on robustness against various threats in general: - H2, H3 and QUIC now maintain a counter of per-connection glitches, which are characterized by not strictly illegal but suspicious or bogus protocol handling and behavior from a peer. Such counters are reported at upper layers, are trackable in stick-tables, and can be used to kill a misbehaving connection past a threshold. The goal here is to significantly reduce the CPU impact and log pollution caused by bots that blindly try to exploit various well-known vulnerabilities or limitations of some implementations. Since this works on both sides it can also be used to detect faulty applications that would need to be fixed. - H2 now supports to forcefully close connections after a configurable number of streams. This can be used to accelerate the switchover during reloads, as well as maintain an optimal balance between multiple front nodes, and force the re-evaluation of sanity checks at the connection level regarding tracked metrics to more easily get rid of abusers. - two new global settings now make it possible to simply prevent HAProxy from accepting traffic from privileged ports; one setting is for TCP and the other one for QUIC. QUIC was configured by default to refuse such traffic, because by relying on UDP it's particularly exposed to DNS and NTP amplification attacks, and while it's more efficient to filter such ports upstream, it's still very simple and cheap to just drop such undesirable packets before processing them. - the code no longer depends on libsystemd, so that we will not pull in a myriad of questionable dependencies anymore. This also allows to enable USE_SYSTEMD by default (it's only done on linux-glibc though), thus reducing configuration combinations. As with every version comes a comprehensive collection of performance improvements: - quic: the fast-forwarding mechanism now considers the flow control state, resulting in a reduction of the number of wakeups and better filling of packets. The internal send API was reworked and simplified and one buffer copy could be removed. Some minor fixes and cleanups were done in the cubic congestion controller. - a new QUIC setting, "tune.quic.reorder-ratio" was added to let the user adjust the size of holes over the in-flight window before we declare a loss. Normally QUIC users should observe much better performance now, even with the default setting (50%), which was sufficient for us to observe x10-20 at 3% losses. The send path was improved and cleaned up, by using exclusively sendmsg() and avoiding some copies where possible. Some CPU savings are expected on intense workloads. - the H1 mux now also supports zero-copy forwarding for chunks of unknown size (i.e. those larger than a buffer). - the fast forward zero-copy mechanism is now supported by applets. This will ultimately result in lower memory usage and higher performance for some applets such as the cache by carefully avoiding to queue more data than the mux can take without buffering. This can still be disabled by unsetting tune.cache.zero-copy-forwarding. - a few ebtree backports improved the performance on non-x86 machines (typically ~2% faster string lookups were measured on ARM and ~3% task switching rate was measured). - some of the remaining server name lookups that were still linear moved to use the tree instead, speeding up certain operations or config parsing. - ring: the ring internal API used to represent a bottleneck for traces at TCP logs, especially on multi-threaded systems due to the initially unplanned locking that resulted from the underlying buffer API. All of this was entirely rewritten so that the code is almost lockfree and waiting threads can prepare their work as groups in parallel. The performance increased by a factor of 2.5 on NUMA systems and even by 20 on uniform systems, reaching up to around 7 million messages per second. This is sufficient to enable traces at the "developer" level even on moderately loaded systems. The "haring" utility was updated to automatically detect the new, slightly different format and support both the old and the new ones (the old haring tool will still read the new format in repair mode). - stick-tables are now sharded over multiple tree heads each with their own locks. This significantly reduces locking contention on systems with many threads (gains of ~6x measured on a 80-thread systems). In addition, the locking could be reduced even with low thread counts, particulary when using peers, where the performance could be doubled. This is particularly noticeable when using the bandwidth limiting filter "bwlim". - The Lua latency with single-threaded scripts (loaded by "lua-load") running on multi-thread instances was improved a lot by reducing the amount of consecutive instructions a thread may run when there are many threads. A few changes that improve observability: - a few more sample fetches corresponding to certain log-format aliases were added (txn.redispatched, bc_be_queue, bc_srv_queue, etc). - new sample fetch functions retrieve the number of concurrent streams over the same connection for a frontend or a backend, as well as the maximum number negotiated. This can be useful to sort out connection performance from stream performance when looking at timings in logs. - the Prometheus exporter now exposes a bunch of new metrics (resolvers, more server stuff) and supports applying filters to limit the metrics that have to be returned. Some debugging aid to save experts time in field, speed up recovery and reduce the number of round trips in issues: - stick-table operations over the CLI using commands like "show table", "set table" and "clear table" now supports a "ptr" argument to directly use the pointer retrieved from a previous "show" command. This is convenient to remove bogus entries manually for example. - haproxy -dD will now report suspicious ACL pattern values which look like known ACL/sample fetch keywords. - the "insecure-fork-wanted" option now has an equivalent on the command line, "-dI". It's convenient to obtain decoded ASAN outputs for example, without having to edit a config - QUIC and HTTP/3 added some traces, refined some error reporting, and improved the accuracy of the "show quic" output. - the backend equivalent of the frontend keylog mechanism was implemented, so that it is now possible to decipher TLS captures on the backend side. The log-format to be used becomes a bit large, please refer to the example in the doc. - some internal large memory areas (file descriptor tables, HTTP and SSL session caches, ring buffers etc) now have a name that is visible on Linux >= 5.17 in /proc/$pid/maps or using pmap. This will help figure out where the memory is being used and why. - traces are way faster on multi-threaded systems thanks to the ring locking changes, making them usable without risks on moderately loaded systems. Some possibly (but unlikely) breaking changes: - an update of the DeviceAtlas addon was made to support the new version of the library. It slightly changes the build system but so far no issue was reported. - a mistake I accidentally introduced two years ago with a bug fix had the undesired side effect of randomly accepting chained commands on the CLI in non-interactive mode, when delimited by line feeds. The likelihood that it would work is essentially time-based, so a short string of multiple commands had great chances of working while a large one almost none. This started to cause side effects to other issues and had to be fixed, so that we no longer accept multiple commands delimited by '\n' in non-interactive mode, as documented. If you happen to have such scripts sending multiple commands this way, you may have to fix them (either use the semi-colon ';' to delimit the commands, or switch to interactive mode via the "prompt" command). A warning is emitted when this unreliable behavior is detected, to ease detection of faulty scripts. - the "enabled" server keyword used to be silently ignored when adding a dynamic server. Now it's properly rejected to avoid confusing scripts. - the way the memory limitation specified by "-m" on the command line was handled on Linux using RLIMIT_AS got completely useless over time due to much more fragmented memory spaces on 64-bit platforms, ASLR, and the fact that it had been chosen exclusively to avoid underestimating the allocated buffers' cost, which originally were allocated all the time even when empty. Nowadays this is no longer relevant since buffers are only allocated when used, and the current state had the nasty effect of causing OOMs way below the configured limit, rendering it pretty useless. The use of RLIMIT_AS was now dropped in favor of the more reliable RLIMIT_DATA like on other operating systems. - the "namespace" keyword used to be silently ignored on "bind" and "server" lines using UNIX sockets. Now it is properly used and checked, thus it may fail if it references an invalid value. If the previous configuration used to work, it probably means the keyword was not needed. In addition, the presence of the keyword on a "server" line may also cause a boot failure that was previously only detected at run time, if permissions are insufficient. There's no loss of functionality here, only a check performed earlier to ensure the process boots in a properly working state. - the HTTP/1 URI parser no longer accepts invalid origin-form URIs that start neither with a '/' nor a '*' (e.g. "index.html" without leading slash). Even if some servers would still accept that, clients that would be compatible with this have disappeared way more than a decade ago, and continuing to support this for such broken applications would probably lead to an abuse sooner or later, so better put an end to this now. - a workaround for an issue affecting QUIC on LibreSSL when running on non-x86 machines was developed jointly with the LibreSSL team. There's an issue with the CHACHA20_POLY1305 cipher when used in-place (for QUIC) that has been well identified and will be fixed in version 4.0 of LibreSSL. The workaround consists in making the QUIC connection fail fast so that the client can quickly retry using TCP. We'll disable it once a stable LibreSSL version is out with the fix. A config-based workaround consists in forcing the ciphers, and exclude this one. And we even found some room to improve the code's maintainability and clarity, which will hopefully further lower the barrier to contribution: - applet: most of the internal API rework was done, which simpifies the upper layers and the applet code as well (for those that were converted). New applet code will have its own buffers and even less stuff to care about. This is also true for the CLI keyword handlers which can now be written in a more natural way and may now yield even when not blocked. - a significant part of the internal "shutdown" API was cleaned up so that there is now only one function at each layer instead of one per direction. Not only this did eliminate very old legacy code ported over the years, it also made it possible to forward gRPC cancellations. - prometheus: a new registration mechanism was added to permit to register metrics per module (e.g. stick-tables, resolvers etc). The extra counters are also dumped if requested now (frontend, backend, listener, server). I'm fairly certain that I forgot a few things. As usual, I'm told that my coworkers at HAProxyTech also went through this tedious task of enumerating the changes, and it will be posted soon here: https://www.haproxy.com/blog/announcing-haproxy-3-0 My understanding is that there will be some followups with a focus on selected points. I'm not surprised by the difficulty of the exercise this time ;-) For this version, we've got an increased help from various testers who accepted to run one (or a few) servers with the development version, and who were able to report a few problems with accurate version ranges, as well as traces and info that permitted to fix the issues quickly. It worked amazingly well and allowed us to address some nasty bugs that are fairly hard to reproduce and that were present for several versions already. At the risk of repeating myself, thanks for that! I know that operating a -dev version requires a bit more involvement than a stable one but it's also a win-win: when something doesn't please you, it's not too late to suggest a change, and you can benefit from the latest debugging features and performance improvements. I sincerely hope that this success will encourage other users into that direction. The nice benefit for the user of facing a bug in -dev vs -stable is that we have no problem developing new debugging extensions just for that issue, so a git pull is enough to suddenly make the problem much more observable and require less amount of work to filter data than with a stable version. And something that's human is that developers tend to be much more attracted by issues affecting areas that are still fresh in their heads and will tend to treat them with higher priority. I also noticed more exchanges from various participants on the issues and here on the list, so big thanks as well to those who take time to review other users' problem reports and requests for help. Especially for first-time reporters, it gives them a great experience of the project and its community. As usual with a new major release comes the death of an old one. This time it's 2.0 that passed away after 5 years serving as a transition between the old legacy versions and the newer HTX-enabled ones. I'm fairly sure there are still some here and there, so please consider this as a reminder that it's about time to upgrade. And 2.4 turned to critical fixes only status. On a side note (not very funny but surprising), apparently there was a big GitHub outage last night, and this morning we're getting a "Ooops 500" page on the haproxy repository there: https://github.com/haproxy/haproxy The issues seem to be working, the wiki and docs projects as well. So I suspect that an error page got cached during the outage and continues to be delivered for whetever reason. I opened a ticket to their support and we'll see when we get a response. Fortunately we're not completely blocked, but it feels strange to release on a day of outage. After all, that's a form of resilience that also makes one use a load balancer, so there's some logic there. Speaking of resilience, I'm going to take a bit of vacation next week and the week after (maybe I should have postponed given the heavy rain here), but you're in good hands with the rest of the team, and Christopher is back on Monday, fresh an in full force. Maybe you'll even manage to convince him to emit -dev1 himself, who knows :-) Please find the usual URLs below : Site index : https://www.haproxy.org/ Documentation : https://docs.haproxy.org/ Wiki : https://github.com/haproxy/wiki/wiki Discourse : https://discourse.haproxy.org/ Slack channel : https://slack.haproxy.org/ Issue tracker : https://github.com/haproxy/haproxy/issues Sources : https://www.haproxy.org/download/3.0/src/ Git repository : https://git.haproxy.org/git/haproxy-3.0.git/ Git Web browsing : https://git.haproxy.org/?p=haproxy-3.0.git Changelog : https://www.haproxy.org/download/3.0/src/CHANGELOG Dataplane API : https://github.com/haproxytech/dataplaneapi/releases/latest Pending bugs : https://www.haproxy.org/l/pending-bugs Reviewed bugs : https://www.haproxy.org/l/reviewed-bugs Code reports : https://www.haproxy.org/l/code-reports Latest builds : https://www.haproxy.org/l/dev-packages I verified what I had in mind for 3.0 and 3.1-dev0 (that just opened), and I think all is good (Tim already fixed an incorrect color on the docs index). As usual, if (should I say when?) you detect a broken link, just let me know so I can fix it. Have fun! Willy --- Complete changelog from 3.0-dev13: Amaury Denoyelle (2): DOC: streamline http-reuse and connection naming definition REGTESTS: complete http-reuse test with pool-conn-name Aurelien DARRAGON (3): MINOR: log: rename 'log-format tag' to 'log-format alias' DOC: config: document logformat item naming and typecasting features DOC: config: add %ID logformat alias alternative Valentine Krasnobaeva (3): CLEANUP: ssl/ocsp: readable ifdef in ssl_sock_load_ocsp BUG/MINOR: ssl/ocsp: init callback func ptr as NULL BUG/MINOR: activity: fix Delta_calls and Delta_bytes count William Lallemand (2): MINOR: sample: implement the uptime sample fetch CI: github: upgrade the WolfSSL job to 5.7.0 Willy Tarreau (11): CI: scripts: fix build of vtest regarding option -C CI: scripts: build vtest using multiple CPUs BUILD: makefile: yearly reordering of objects by build time BUILD: fd: errno is also needed without poll() DOC: config: fix two typos "RST_STEAM" vs "RST_STREAM" DOC: config: refer to the non-deprecated keywords in ocsp-update on/off CLEANUP: ssl_sock: move dirty openssl-1.0.2 wrapper to openssl-compat DOC: install: update quick build reminders with some missing options DOC: install: update the range of tested openssl version to cover 3.3 DEV: patchbot: prepare for new version 3.1-dev MINOR: version: mention that it's 3.0 LTS now. ---