Re: [RFC] Allow disabling abstract unix socket paths NUL-padding
On 09/03/2024 18:09, Tristan wrote: Hi Willy, On 09/03/2024 16:51, Willy Tarreau wrote: Hi Tristan, On Sat, Mar 09, 2024 at 04:20:21PM +, Tristan wrote: To be honest, I don't think this is unfixable. It's just a matter of how much code change we think is acceptable for it. I don't mind about the amount of changes. "we've always done it like this" is never a valid excuse to keep code as it is, especially when it's wrong and causes difficulties to add new features. Of course I prefer invasive changes early in the dev cycle than late but we're still OK and I don't expect that many changes for this anyway. Agreed on not shying away from amount of changes. I tried to follow that route, being generous in what I was willing to touch, but ran into the issue of protocols and socket families being a 1:1 binding. So adding a new socket family *requires* adding a new protocol too, which isn't too hard but spirals a bit out of control: - protocol_by_family is a misnomer, and it is actually protocols *by socket domain* (which is still AF_UNIX, even with family AF_CUST_ABNS2, and ends up overriding standard socket/abns protocol) - but you can't just change that, because then other things stop working as some places rely on socket domain != socket family Actually, following up on this, I'm more and more convinced that it actually makes things somewhat worse to take this approach... At least I don't see a good way to make it work without looking very bad, because in a few places we lookup a protocol based on a socket_storage reference, so we have truly only the socket domain to work with if no connection is established yet. While it sounds like shying away from deeper changes, it seems to me that a bind/server option is truly the right way to go here. Tristan
Re: [RFC] Allow disabling abstract unix socket paths NUL-padding
Hi Willy, On 09/03/2024 16:51, Willy Tarreau wrote: Hi Tristan, On Sat, Mar 09, 2024 at 04:20:21PM +, Tristan wrote: To be honest, I don't think this is unfixable. It's just a matter of how much code change we think is acceptable for it. I don't mind about the amount of changes. "we've always done it like this" is never a valid excuse to keep code as it is, especially when it's wrong and causes difficulties to add new features. Of course I prefer invasive changes early in the dev cycle than late but we're still OK and I don't expect that many changes for this anyway. Agreed on not shying away from amount of changes. I tried to follow that route, being generous in what I was willing to touch, but ran into the issue of protocols and socket families being a 1:1 binding. So adding a new socket family *requires* adding a new protocol too, which isn't too hard but spirals a bit out of control: - protocol_by_family is a misnomer, and it is actually protocols *by socket domain* (which is still AF_UNIX, even with family AF_CUST_ABNS2, and ends up overriding standard socket/abns protocol) - but you can't just change that, because then other things stop working as some places rely on socket domain != socket family If push comes to shove, one can always add an extra field somewhere, or an extra arg in get_addr_len, even if it'd be a little ugly. Actually you're totally right and you make me realize that there's an addrcmp field in the address families, and it's inconsistent to have get_addr_len() being family-agnostic. So most likely we should first change get_addr_len() to be per-family and appear in the proto_fam struct. The few places where get_addr_len() is used should instead rely on the protocol family for this. And due to this new address having a variable length, we should support passing a pointer to the address (like the current get_addr_len() does) in addition to the protocol pointer. I tried to do that yes, changing sock_addrlen to be a function pointer. But it's actually less convenient, surprisingly, since then you end up with constructs like: listener->rx/tx->ss->proto->fam->get_addrlen(listener->rx/tx->ss) (paraphrasing from memory; point is that it becomes somewhat unpleasant to look at) Still might end up having to do that given how tricky things become otherwise... Doing so would cleanly unlock all the problem. The new abns2 family would just bring its own get_addr_len() variant that uses strlen(). Agreed on that OOP-ish approach being in principle a bit nicer though (sorry for the swear word ;-) ) Tristan
Re: [RFC] Allow disabling abstract unix socket paths NUL-padding
Hi Tristan, On Sat, Mar 09, 2024 at 04:20:21PM +, Tristan wrote: > To be honest, I don't think this is unfixable. It's just a matter of how > much code change we think is acceptable for it. I don't mind about the amount of changes. "we've always done it like this" is never a valid excuse to keep code as it is, especially when it's wrong and causes difficulties to add new features. Of course I prefer invasive changes early in the dev cycle than late but we're still OK and I don't expect that many changes for this anyway. > If push comes to shove, one > can always add an extra field somewhere, or an extra arg in get_addr_len, > even if it'd be a little ugly. Actually you're totally right and you make me realize that there's an addrcmp field in the address families, and it's inconsistent to have get_addr_len() being family-agnostic. So most likely we should first change get_addr_len() to be per-family and appear in the proto_fam struct. The few places where get_addr_len() is used should instead rely on the protocol family for this. And due to this new address having a variable length, we should support passing a pointer to the address (like the current get_addr_len() does) in addition to the protocol pointer. Doing so would cleanly unlock all the problem. The new abns2 family would just bring its own get_addr_len() variant that uses strlen(). This just shows how important it is to discuss about design ;-) Cheers, Willy
Re: [RFC] Allow disabling abstract unix socket paths NUL-padding
Hi Willy, On 08/03/2024 18:01, Willy Tarreau wrote: The problem with default values is that a single behavior cannot be deduced from reading a single config statement. That's quite painful for lots of people (including those who copy config blocks from stackoverflow), and for API tools. And it works half of the time because internally both modes work but they can't interoperate with other tools. Here we're really indicating a new address format. There's nothing wrong with that and we do have the tools to handle that. I think that the plan should be: - add AF_CUST_ABNS2 to protocol-t.h before AF_CUST_MAX - add to str2sa_range() the case for "abns2" just after "abns" with address family "AF_CUST_ABNS2", and duplicate the test for ss.ss_family == AF_UNIX later for AF_CUST_ABNS2 removing all the stuff related to !abstract, or instead, just extend the "if" condition to the new family and add that case there as well (might be even easier). - duplicate "proto_fam_unix" to "proto_fam_abns2" in sock_unix.c, referencing "sock_abns2_addrcmp" for the address comparison - duplicate sock_unix_addrcmp() to sock_abns2_addrcmp() and implement only the abns case, basically: if (a->ss_family != b->ss_family) return -1; if (a->ss_family != AF_CUST_ABNS2) return -1; if (au->sun_path[0]) return -1; return memcmp(au->sun_path, bu->sun_path, sizeof(au->sun_path)); - in sock_unix_bind_receiver(), add a case for this address family in the bind() size argument, replacing sizeof(addr) with the string length when running AF_CUST_ABNS2. - for get_addr_len() I need to think :-/ I actually don't know how that one works with socketpairs already, so there might be a trick I'm overlooking. Alright. I will look at how that looks and report back shortly. I need to leave now, but I continue to think that if there is any internal shortcoming, we should not try to work around it at the expense of trickier long-term maintenance, instead we should address it. And don't worry, I'm not going to assign you the task of dealing with limited stuff. We may end up with the workaround in the worst case if we don't find better, but that would mean declaring a failure and having to break that stuff sometime in the future, which is always a pain. To be honest, I don't think this is unfixable. It's just a matter of how much code change we think is acceptable for it. If push comes to shove, one can always add an extra field somewhere, or an extra arg in get_addr_len, even if it'd be a little ugly. Tristan
[ANNOUNCE] haproxy-3.0-dev5
Hi, HAProxy 3.0-dev5 was released on 2024/03/09. It added 58 new commits after version 3.0-dev4. Again mostly fixes for recent regressions dominate this version (ocsp crashes, zero-copy forwarding) and for older bugs (locking issues in Lua, QUIC freezes during handshake, initial settings for "add server"). Among the new features, we finally support draining HTTP/1 requests when we respond early to POST requests (the typical redirect or 401 on POST). Previously we'd send the response, drain pending data and close if not all data were sent. But given that there remain rare cases where this continues to cause trouble to some clients (late incoming data can cause a reset in the TCP stack and destroy the response), and that the mux-based architecture now makes this much easier, it was about time to implement it to get rid of this rare but annoying case. The rest is pretty minor, an AES encryption converter (we used to only have the decryption side), Solaris build fixes, improved "show quic" output to help troubleshooting, improved performance when traces are enabled with an attached reader (previously we used to rely on a lock to make sure to emit the dropped counter, but that approach was wrong and causing everything to work at the speed of the slowest thread). Ah and we got a report of a funny bug affecting the "random" balance algorithm. Internally we have two random generators, a slow one which is suitable for generating UUIDs and and a fast one which is only suitable for statistical randoms. Obviously "balance random" relies on the second one, which produces a predictable sequence for a given thread. It just turns out that the sequence was initialized with the thread number and that incoming connections are distributed by default in round-robin fashion to available threads. The end result of all of this is that when using "balance random", the first request would always be sent to the same server, which creates a visible skew for those who reload very frequently! This was fixed by seeding the fast one with the slow one at boot. Who would have imagined that reloading very frequently would exhibit such design limitations! And as usual, cleanups, doc and CI updates close the list. Over the last two weeks, we've participated to interesting discussions with a few users who explained how some of the limitations regarding the use of dynamic servers affect their usage. Some of them were quickly addressed but what remains was written down in GitHub issues 2469, 2482 and 2483. Those who try to minimize the number of reloads might want to have a look there and possibly feed the design discussions. Please find the usual URLs below : Site index : https://www.haproxy.org/ Documentation: https://docs.haproxy.org/ Wiki : https://github.com/haproxy/wiki/wiki Discourse: https://discourse.haproxy.org/ Slack channel: https://slack.haproxy.org/ Issue tracker: https://github.com/haproxy/haproxy/issues Sources : https://www.haproxy.org/download/3.0/src/ Git repository : https://git.haproxy.org/git/haproxy.git/ Git Web browsing : https://git.haproxy.org/?p=haproxy.git Changelog: https://www.haproxy.org/download/3.0/src/CHANGELOG Dataplane API: https://github.com/haproxytech/dataplaneapi/releases/latest Pending bugs : https://www.haproxy.org/l/pending-bugs Reviewed bugs: https://www.haproxy.org/l/reviewed-bugs Code reports : https://www.haproxy.org/l/code-reports Latest builds: https://www.haproxy.org/l/dev-packages Willy --- Complete changelog : Amaury Denoyelle (8): BUG/MEDIUM: server: fix dynamic servers initial settings MINOR: quic: filter show quic by address MINOR: quic: specify show quic output fields MINOR: quic: add MUX output for show quic BUG/MEDIUM: quic: fix connection freeze on post handshake BUG/MINOR: mux-quic: fix crash on aborting uni remote stream BUG/MEDIUM: quic: fix handshake freeze under high traffic MINOR: quic: always use ncbuf for rx CRYPTO Aurelien DARRAGON (15): LICENSE: event_hdl: fix GPL license version LICENSE: http_ext: fix GPL license version BUG/MINOR: hlua: fix unsafe lua_tostring() usage with empty stack BUG/MINOR: hlua: don't use lua_tostring() from unprotected contexts BUG/MINOR: hlua: fix possible crash in hlua_filter_new() under load BUG/MINOR: hlua: improper lock usage in hlua_filter_callback() BUG/MINOR: hlua: improper lock usage in hlua_filter_new() BUG/MEDIUM: hlua: improper lock usage with SET_SAFE_LJMP() BUG/MAJOR: hlua: improper lock usage with hlua_ctx_resume() BUG/MINOR: hlua: don't call ha_alert() in hlua_event_subscribe() MINOR: hlua: use SEND_ERR to report errors in hlua_event_runner() CLEANUP: hlua: txn class functions may LJMP CLEANUP: log: fix obsolete comment for add_sample_to_logformat_list() CLEANUP: tree-wide: us
Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address
Hi Anthony, On Wed, Feb 21, 2024 at 11:49:45AM -0500, Anthony Deschamps wrote: > Hi Willy, thanks for the thoughtful feedback. > > Here's a new patch that makes this configurable via a "hash-key" server > argument, which defaults to "id" as you suggested. Thanks. > I'm struggling to test the last case you described. A quick description of > my test setup: I have multiple instances of a simple echo server running > on a range of ports, which I spawn/kill manually. I then have a local consul > agent providing DNS, and multiple proxy instances running, and my actual > test is a short Python script that makes a series of requests to each of the > proxies and checks whether they were all routed to the same echo server > (they identify themselves by setting a header in the response). I can share > all the test scripts/config if it's helpful. I'm not sure the scripts will help me (at least :-)). I was thinking that a test could just be "set server XXX addr YYY" on the CLI to change the server's address and verify that the hashing follows the address not the server number. > When I configure my proxy with long health checks -- by setting > "server-template ... check inter 6" -- I still find that when I > kill/spawn an > instance of the mock service, a server will first go down when an entry is > removed from the SRV record, and then get rehashed when a new SRV > entry is assigned to the server. Is there a different sequence of events I > should be testing? Well, if it does that and your server continues to get the traffic it should get, then that's fine. > Here is the updated patch. Thank you. I'm seeing that your mailer mangled it (see below). Better try again attaching it. Another detail, it looks like the doc entry is located inside the "bind keywords" section, while it should be in section 4.1 "proxy keyword matrix", located between "hash-balance-factor" and "hash-type". Please also make sure to add one entry in the keywords table at the beginning of the section. Oh and really do not hesitate to ping me again if I don't respond within 7 days, because it then becomes very hard for me to get back to a previous discussion. I just remembered there was something to look for on the list, but I could easily have lost it :-/ Thanks! Willy > >From 7d8a63803017c9b7630ec5cb3a154778fdff4d86 Mon Sep 17 00:00:00 2001 > From: Anthony Deschamps > Date: Fri, 16 Feb 2024 16:00:35 -0500 > Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server > address > > Motivation: When services are discovered through DNS resolution, the order > in > which DNS records get resolved and assigned to servers is arbitrary. > Therefore, > even though two HAProxy instances using chash balancing might agree that a > particular request should go to server3, it is likely the case that they > have > assigned different IPs and ports to the server in that slot. > > This patch adds a server option, "hash-key " which can be set to "id" > (the > existing behaviour, defaut), "addr", or "addr-port". By deriving the keys > for > the chash tree nodes from a server's address and port we ensure that > independent > HAProxy instances will agree on routing decisions. If an address is not > known > then the key is derived from the server's puid as it was previously. > > When adjusting a server's weight, we now unconditionally remove all nodes > first, > since the server's address (and therefore the desired node keys) may have > changed. > --- > doc/configuration.txt | 21 > include/haproxy/server-t.h | 8 > src/lb_chash.c | 98 +- > src/server.c | 28 +++ > 4 files changed, 132 insertions(+), 23 deletions(-) > > diff --git a/doc/configuration.txt b/doc/configuration.txt > index 1065e6098..a0710395a 100644 > --- a/doc/configuration.txt > +++ b/doc/configuration.txt > @@ -15763,6 +15763,27 @@ group >"gid" setting except that the group name is used instead of its gid. This >setting is ignored by non UNIX sockets. > > +hash-key > + Specify how "hash-type consistent" node keys are computed > + > + Arguments : > +may be one of the following : > + > + id The node keys will be derived from the server's position > in > + the server list. > + > + addr The node keys will be derived from the server's address, > when > + available, or else fall back on "id". > + > + addr-port The node keys will be derived from the server's address > and > + port, when available, or else fall back on "id". > + > + The "addr" and "addr-port" options may be useful in scenarios where > multiple > + HAProxy processes are balancing traffic to the same set of servers. If > the > + server order of each process is different (because, for example, DNS > records > + were resolved in different orders) then this will allow each independent > + HAProxy processes to agree on
Re: [PATCH] MINOR: lb-chash: Respect maxconn when selecting a server
Hi Anthony, it seems I forgot about this thread, being sidetracked on other stuff... On Wed, Feb 21, 2024 at 04:41:04PM -0500, Anthony Deschamps wrote: > Hi Willy, > > I wonder if I could accomplish what I'm looking to do by changing the > behaviour of "maxqueue" (without making a breaking change, ideally). > Currently, "maxqueue 0" is interpreted as "unlimited". However, the system > I'm working with has long-running WebSocket connections, so a queued > request is likely to sit in the queue for too long. > > Is there a way to configure "maxqueue 0" for real? If not, then perhaps > that's a patch I should make first, and then come back to this one. Since > "maxqueue 0" already means "unlimited", maybe a "no-queue" option > would be best in order to avoid making a breaking change. Oh, sadly I didn't remember that maxqueue 0 was unlimited. Indeed, that's annoying because I do see the value in totally preventing the queue from being used. I guess we copied that from maxconn and other limits to preserve the same logic of 0=unlimited. But that wasn't very wise for this specific one :-/ Maybe (big maybe) we could change the default value to -1 for unlimited as a few other settings have, and permit value zero to be used. The problem I'm having with this is for APIs and those who automate their config generation, and who might possibly already emit "maxqueue 0" to make it infinite. In addition, the maxqueue feature was introduced 16 years ago in 1.3.14 by commit acafc5f88 ("[MEDIUM] add support for "maxqueue" to limit server queue overload"), and documented a few months later in 1.3.15 by commit 198a744e1 ("[DOC] all server parameters have been documented"), so the risk is not that minimal. I'm CCing Marko who develops the Dataplane API. This should be very representative of how such values could be interpreted. Marko, do you know if the dpapi ever emits "maxqueue 0" or makes a special case of it when it finds it in a config ? Here the problem is that a poor historical choice prevents one from saying that we don't want to queue at all on a server. I'd like to hear about others here on the list regarding this. If there's a general consensus that it's mostly harmless to change the infinite from 0 to -1 and permit 0 to be used, we could do it and put it in the release notes. Otherwise we'll need to find another solution (maybe an extra keyword to disable any queuing). Thanks! willy
Re: [PR] BUILD: solaris: fix compilation errors
On Tue, Feb 27, 2024 at 10:23:02AM +, PR Bot wrote: > Dear list! > > Author: matthias sweertvaegher <178714+mx...@users.noreply.github.com> > Number of patches: 1 > > This is an automated relay of the Github pull request: >BUILD: solaris: fix compilation errors Now merged, thank you Matthias! Willy