[ANNOUNCE] haproxy-2.9-dev4

Willy Tarreau Fri, 25 Aug 2023 10:35:53 -0700

Hi,

HAProxy 2.9-dev4 was released on 2023/08/25. It added 59 new commits
after version 2.9-dev3.


Some interesting new stuff continues to arrive in this version:

  - maps: the set-map action (and equivalent Lua calls) used to update
    an entry in O(N) for N elements in a map, due to the original
    design focusing on limiting memory usage. Nowadays despite the
    warnings in the doc, it appears that more and more users are
    relying on set-map on the traffic path in Lua or HTTP actions
    and are sometimes reporting hard-to diagnose CPU usage issues
    which end up being caused just by this. The reference lookup
    code is now in O(log(N)) and the propagation in O(1) so this will
    be much better for these special use cases. The memory increased
    from 72 to 120 bytes per map entry (to which about as much was
    already added for each instanciation), so that's reasonable, and
    will later be shrunk again. In addition, lookups in empty maps
    used to sollicit the LRU cache anyway, which could represent up
    to about 4-5% CPU in tests. That's too bad, considering we know
    the map is empty, hence the cache as well, so now the lookup is
    avoided on empty maps and acls.

  - when idle connections were reworked to support SNI, a side effect
    of replacing the list with a tree was that they were now put back
    at the end of the list instead of at the head, hence they were all
    used in a round-robin fashion, something which is not that great
    when thinking about purging excess connections nor when trying to
    concentrate most of the traffic on few connections (e.g. window
    sizes will not necessarily increase etc). This was changed so that
    the previous behavior is restored and the recently used connections
    packed together at the head again, so that we have the hottest and
    most reliable ones at the head and the least trusted ones at the
    tail. This allows the purge mechanism to kill from the tail and
    preserve the most recently used ones.

  - limited-quic: now in order to make sure not to fool users, when
    building with SSL library which does NOT support QUIC (i.e.
    OpenSSL), "quic" bindings will be properly rejected unless the
    "limited-quic" is specified in the config (and it's suggested in
    the error message). Previously it would only silently ignore them,
    resulting in a non-working config, causing confusion to those who
    copy-paste configs without being aware of this. Also a warning is
    now emitted in this mode when "allow-0rtt" is specified, as the
    "limited-quic" compatibility layer doesn't support it.

  - reverse HTTP: see below for a complete description. I hope it will
    answer Alex's question :-)

  - xxhash was updated to 0.8.2 (we were on 0.8.1) because it fixes a
    build issue on ppc64le.

  - various doc/regtest/CI updates as usual.

Now, regarding reverse HTTP: that's a feature that we've been repeatedly
asked for over the last decade, constantly responding "not possible yet".
But with the flexibility of the current architecture, it appeared that
there was no more big show-stopper and it was about time to respond to
this demand. What is this ? The principle is to permit a server to
establish a connection to haproxy, then to switch the connection
direction on both sides, so that haproxy can send requests to that
server. There was a trend around this 20 years ago on HTTP/1 and it
didn't work well, to be honest. And we were counting on H2 to do that
because it allows to multiplex streams over a connection and to reset
a stream without breaking a connection. There are 4 use cases I'm
currently aware of, though others might be creative:

  - isolation: a server in a purely outgoing DMZ, connects to the
    edge load balancer and receives requests from there. There's
    zero incoming connection to that DMZ. Some security environments
    require this (not that I fully agree with this, to be honest).

  - work around painful NAT: mobile developers who want to test
    their applications on their smartphone often have to either
    push on a public dev server, or hack around the local network's
    wifi to permit their phone to connect directly to the dev PC.
    Here it can be much simpler, their PC connects to a public
    gateway, registers there and instantly receives the traffic for
    the configured host name and delivers it to the application
    running locally in debug mode with traces etc. A similar use
    case consists in working around the difficulty to set up port-
    forwarding on some home internet accesses, here you can expose
    your internal application directly outside via a public gateway
    again, using exclusively an outgoing connection. Exactly the
    same can be done with containers: instead of having to know
    what ports to NAT, it can be convenient to let the server in
    the container directly register to the external gateway. I'll
    soon try to setup one on a public server so that I can receive
    incoming requests on my laptop anywhere.

  - multi-path and high availability in complex setups: a server can
    register to public edge gateways via multiple paths (or even
    multiple internet links like one could do at home with a fibre and
    an xDSL backup), and the traffic will arrive via these connections.

  - config-less automatic webserver registration: an application server
    would only have to know the address of the local LB and connect
    there to immediately receive traffic without having to announce
    itself nor to rely on other discovery mechanisms.

How does this work ? It's not easy to describe due to the reversal
of the connection that switches roles and involves confusing terms.
I'll use the term "origin" to describe the target server, "gateway"
for the public node, and "visitor" to describe the person wanting to
access the origin. The origin connects to the gateway over H2+TLS,
presents a certificate whose CN contains the FQDN name that will be
matched outside. This cert was signed by the same authority which
operates the gateway so it's possible to know if this FQDN is
allowed or not. The gateway receives the connection, detects it's a
reversal attempt, and places this connection into a backend server's
idle connections pool, associated with the host name presented by
the origin. In our case, the origin also contains an haproxy node.
It has a dummy listener responsible for creating idle connections to
the external gateway and waiting for requests on them. Then a visitor
wants to visit the site on this FQDN, connects to the gateway which
has this IP address, enters a frontend which can route the request
to the server which has those idle connections. If no matching
connection is found, a 503 is returned, otherwise it's used and the
request is sent over that connection and reaches the origin. In our
case this origin is haproxy and delivers it to the local server, but
we could imagine that once this becomes successful, some servers will
implement it to receive the traffic directly.

As a pure coincidence (really), 2 hours after we finished our first
design meeting, a draft describing almost exactly the same design was
sent on the IETF HTTP workgroup:

   https://datatracker.ietf.org/doc/draft-bt-httpbis-reverse-http/

There are small differences with our initial design but we're going to
participate with the editors, sharing feedback from our implementation,
adjusting it and/or the draft depending on what we'll all learn there.
The goal will be to see this protocol become a standard with its own RFC,
and as long as it remains a draft, our support will be experimental and
subject to change to adapt to ongoing definitions.

The implementation is very young for now and has quite some limitations
but we preferred to expose it early so as to collect feedback. The
currently known limitations are:

  - idle connections on the gateway will be subject to the server's
    purge and will regularly get killed and instantly recreated by
    the origin. Not dramatic but may cause many outgoing connections
    per day in a firewall logs. It's possible to significantly increase
    both sides client and server timeouts to avoid this.

  - the origin process will not quit on SIGUSR1 (reload) as long as
    it has idle connections since they're seen as idle client conns.
    Bah, Ctrl-C does the job for now :-) Or a client timeout as well.

  - for now the origin will attach all connections to the same thread.
    It's not the place with the most traffic so it's not urgent to
    address but is in the todo list.

  - some stats counters during the connection reversal are unreliable
    (some steps update the frontend and later the backend, that's a
    bit tricky). If you see negative connection counts or stuff like
    this, we're obviously interested in reports.

  - it has been observed that after a failed memory allocation, the
    listener will fail to create new connections.

We also know that some parts of the syntax will be revisited (e.g.
the server's dummy address, maybe even the protocol name etc).

Right now an example config would look like this on the gateway:

    frontend pub
        mode http
        bind :443 ssl crt pub.pem
        use_backend be

    backend be
        mode http
        server srv @reverse sni req.hdr(host)

    frontend priv
        mode http
        bind :444 ssl crt priv.pem verify required ca-verify-file ca-auth.crt 
alpn h2
        tcp-request session attach-srv be/srv name ssl_c_s_dn(CN)

Explanation: the origin will connect to frontend "priv" and will present
its certificate. Its name is extracted and the connection is offered to
server "srv" of backend "be" with this name as the SNI. Then a visitor
comes on frontend "pub", their request is routed to backend "be", which
looks for the Host header and uses it to look for a matching idle
connection. If the name matches the one previously fed and the connection
is still there, the requests is routed over that connection. It's of course
possible to use the same frontend with verify optional, with conditions to
detect and transfer the connection etc, but it's complicated enough so I
wanted to do something "simple".

Now the config on the origin:

    listen fe
        mode http
        bind rev@be/srv maxconn 10
        server srv 127.0.0.1:30080

    backend be
        mode http
        server srv gateway:444 ssl crt my-origin.pem proto h2

Connections are created by fe's "bind" line which references the server.
It will instruct this server to create and maintain connections until
there are up to maxconn (10) available. This server is used for nothing
else, but it conveys everything needed to establish an authenticated
outgoing connection. Incoming requests arriving on these connections
are seen as arriving in listener fe for the declared bind line, and
will take their normal path (here it will be routed in clear to the
local application server running on port 30080).

That's it for now. If issues are met with this new mechanism (or even
suggestions), please be aware that the main developer (Amaury) will be
away for a few weeks, so we'll have to try to gather elements either
here or in github issues so that he has the element once he's back. It
would be interesting also to hear about interest from developers to
implement support for this directly inside their applications or web
servers.

Please find the usual URLs below :
   Site index       : https://www.haproxy.org/
   Documentation    : https://docs.haproxy.org/
   Wiki             : https://github.com/haproxy/wiki/wiki
   Discourse        : https://discourse.haproxy.org/
   Slack channel    : https://slack.haproxy.org/
   Issue tracker    : https://github.com/haproxy/haproxy/issues
   Sources          : https://www.haproxy.org/download/2.9/src/
   Git repository   : https://git.haproxy.org/git/haproxy.git/
   Git Web browsing : https://git.haproxy.org/?p=haproxy.git
   Changelog        : https://www.haproxy.org/download/2.9/src/CHANGELOG
   Dataplane API    : 
https://github.com/haproxytech/dataplaneapi/releases/latest
   Pending bugs     : https://www.haproxy.org/l/pending-bugs
   Reviewed bugs    : https://www.haproxy.org/l/reviewed-bugs
   Code reports     : https://www.haproxy.org/l/code-reports
   Latest builds    : https://www.haproxy.org/l/dev-packages

Willy
---
Complete changelog :
Amaury Denoyelle (25):
      BUILD/IMPORT: fix compilation with PLOCK_DISABLE_EBO=1
      MINOR: proxy: simplify parsing 'backend/server'
      MINOR: connection: centralize init/deinit of backend elements
      MEDIUM: connection: implement passive reverse
      MEDIUM: h2: reverse connection after SETTINGS reception
      MINOR: server: define reverse-connect server
      MINOR: backend: only allow reuse for reverse server
      MINOR: tcp-act: parse 'tcp-request attach-srv' session rule
      REGTESTS: provide a reverse-server test
      MINOR: tcp-act: define optional arg name for attach-srv
      MINOR: connection: use attach-srv name as SNI reuse parameter on reverse
      REGTESTS: provide a reverse-server test with name argument
      MINOR: proto: define dedicated protocol for active reverse connect
      MINOR: connection: extend conn_reverse() for active reverse
      MINOR: proto_reverse_connect: parse rev@ addresses for bind
      MINOR: connection: prepare init code paths for active reverse
      MEDIUM: proto_reverse_connect: bootstrap active reverse connection
      MINOR: proto_reverse_connect: handle early error before reversal
      MEDIUM: h2: implement active connection reversal
      MEDIUM: h2: prevent stream opening before connection reverse completed
      REGTESTS: write a full reverse regtest
      BUG/MINOR: h2: fix reverse if no timeout defined
      MINOR: connection: simplify removal of idle conns from their trees
      MINOR: server: move idle tree insert in a dedicated function
      MAJOR: connection: purge idle conn by last usage

Aurelien DARRAGON (6):
      BUG/MINOR: stktable: allow sc-set-gpt(0) from tcp-request connection
      BUG/MINOR: stktable: allow sc-add-gpc from tcp-request connection
      DEV: makefile: fix POSIX compatibility for "range" target
      BUG/MINOR: hlua_fcn: potentially unsafe stktable_data_ptr usage
      DOC: lua: fix Sphinx warning from core.get_var()
      DOC: lua: fix core.register_action typo

Frédéric Lécaille (7):
      MINOR: quic+openssl_compat: Do not start without "limited-quic"
      MINOR: quic+openssl_compat: Emit an alert for "allow-0rtt" option
      MEDIUM: map/acl: Improve pat_ref_set() efficiency (for "set-map", 
"add-acl" action perfs)
      MEDIUM: map/acl: Improve pat_ref_set_elt() efficiency (for "set-map", 
"add-acl"action perfs)
      MEDIUM: map/acl: Accelerate several functions using pat_ref_elt struct 
->head list
      MEDIUM: map/acl: Replace map/acl spin lock by a read/write lock.
      DOC: map/acl: Remove the comments about map/acl performance issue

Ilya Shipitsin (1):
      CI: fedora: fix "dnf" invocation syntax

Johannes Naab (1):
      DOC: typo: fix sc-set-gpt references

Remi Tricot-Le Breton (1):
      DOC: jwt: Add explicit list of supported algorithms

Sébastien Gross (1):
      DOC: Explanation of be_name and be_id fetches

Tim Duesterhus (1):
      REGTESTS: Do not use REQUIRE_VERSION for HAProxy 2.5+ (3)

William Lallemand (5):
      BUILD: Makefile: add the USE_QUIC option to make help
      BUILD: Makefile: add USE_QUIC_OPENSSL_COMPAT to make help
      BUILD: Makefile: realigned USE_* options in make help
      BUG/MINOR: quic: allow-0rtt warning must only be emitted with quic bind
      BUG/MINOR: quic: ssl_quic_initial_ctx() uses error count not error code

Willy Tarreau (11):
      DEV: flags/show-sess-to-flags: properly decode fd.state
      SCRIPTS: git-show-backports: automatic ref and base detection with -m
      IMPORT: plock: also support inlining the int code
      IMPORT: plock: always expose the inline version of the lock wait function
      IMPORT: lorw: support inlining the wait call
      MINOR: threads: inline the wait function for pthread_rwlock emulation
      MINOR: atomic: make sure to always relax after a failed CAS
      MINOR: pools: use EBO to wait for unlock during pool_flush()
      MINOR: pattern: do not needlessly lookup the LRU cache for empty lists
      IMPORT: xxhash: update xxHash to version 0.8.2
      BUG/MINOR: ssl_sock: fix possible memory leak on OOM

---

[ANNOUNCE] haproxy-2.9-dev4

Reply via email to