Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-04-01 Thread Willy Tarreau
Hi Anthony,

On Mon, Apr 01, 2024 at 11:47:54AM -0400, Anthony Deschamps wrote:
> Hi Willy,
> 
> Those changes are easy enough to make, so I've attached the patch again
> with those changes. I had to make a few small adjustments to the commit
> message anyway (some things that changed as a result of reviewing this
> path). Let me know if there's anything else that I can help with.

Looks perfect now, I've merged it. Thank you very much!
Willy



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-04-01 Thread Anthony Deschamps
Hi Willy,

Those changes are easy enough to make, so I've attached the patch again
with those changes. I had to make a few small adjustments to the commit
message anyway (some things that changed as a result of reviewing this
path). Let me know if there's anything else that I can help with.

Thank you,
Anthony
From 30896b2f6368cd022bfb574d4d5f647483b79a59 Mon Sep 17 00:00:00 2001
From: Anthony Deschamps 
Date: Fri, 16 Feb 2024 16:00:35 -0500
Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server
 address

Motivation: When services are discovered through DNS resolution, the order in
which DNS records get resolved and assigned to servers is arbitrary. Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they have
assigned different IPs and ports to the server in that slot.

This patch adds a server option, "hash-key " which can be set to "id" (the
existing behaviour, default), "addr", or "addr-port". By deriving the keys for
the chash tree nodes from a server's address and port we ensure that independent
HAProxy instances will agree on routing decisions. If an address is not known
then the key is derived from the server's puid as it was previously.

When adjusting a server's weight, we now check whether the server's hash has
changed. If it has, we have to remove all its nodes first, since the node keys
will also have to change.
---
 doc/configuration.txt  | 25 +++
 include/haproxy/server-t.h |  9 
 src/lb_chash.c | 91 +++---
 src/server.c   | 31 +
 4 files changed, 150 insertions(+), 6 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 1065e6098..0d5649d34 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -4813,6 +4813,8 @@ error-log-format  X  X X -
 force-persist -  - X X
 filter-  X X X
 fullconn  X  - X X
+hash-balance-factor   X  - X X
+hash-key  X  - X X
 hash-type X  - X X
 http-after-response   X (!)  X X X
 http-check commentX  - X X
@@ -6606,6 +6608,29 @@ hash-balance-factor 
   See also : "balance" and "hash-type".
 
 
+hash-key 
+  Specify how "hash-type consistent" node keys are computed
+
+  Arguments :
+may be one of the following :
+
+  id The node keys will be derived from the server's numeric
+ identifier as set from "id" or which defaults to its position
+ in the server list.
+
+  addr   The node keys will be derived from the server's address, when
+ available, or else fall back on "id".
+
+  addr-port  The node keys will be derived from the server's address and
+ port, when available, or else fall back on "id".
+
+  The "addr" and "addr-port" options may be useful in scenarios where multiple
+  HAProxy processes are balancing traffic to the same set of servers. If the
+  server order of each process is different (because, for example, DNS records
+  were resolved in different orders) then this will allow each independent
+  HAProxy processes to agree on routing decisions.
+
+
 hash-type   
   Specify a method to use for mapping hashes to servers
 
diff --git a/include/haproxy/server-t.h b/include/haproxy/server-t.h
index f077ff2f8..811eae712 100644
--- a/include/haproxy/server-t.h
+++ b/include/haproxy/server-t.h
@@ -223,6 +223,13 @@ struct pid_list {
 	int exited;
 };
 
+/* srv methods of computing chash keys */
+enum srv_hash_key {
+	SRV_HASH_KEY_ID = 0, /* derived from server puid */
+	SRV_HASH_KEY_ADDR,   /* derived from server address */
+	SRV_HASH_KEY_ADDR_PORT   /* derived from server address and port */
+};
+
 /* A tree occurrence is a descriptor of a place in a tree, with a pointer back
  * to the server itself.
  */
@@ -366,6 +373,8 @@ struct server {
 	struct tree_occ *lb_nodes;  /* lb_nodes_tot * struct tree_occ */
 	unsigned lb_nodes_tot;  /* number of allocated lb_nodes (C-HASH) */
 	unsigned lb_nodes_now;  /* number of lb_nodes placed in the tree (C-HASH) */
+	enum srv_hash_key hash_key; /* method to compute node hash (C-HASH) */
+	unsigned lb_server_key; /* hash of the values indicated by "hash_key&

Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-27 Thread Willy Tarreau
Hi again Anthony,

I'm still having a few comments, but I think nothing that I cannot address
while merging it:

On Wed, Mar 13, 2024 at 12:33:54PM -0400, Anthony Deschamps wrote:
> +static inline u32 chash_compute_server_key(struct server *s)
> +{
> + u32 key = 0;
> + struct server_inetaddr srv_addr;
> +
> + enum srv_hash_key hash_key = s->hash_key;

In general, it's better to prefer the common "reverse christmas tree"
declaration order, i.e. put larger statements first and smaller next,
and more importantly not leave empty lines in the middle of declarations
as these usually indicate the beginning of statements.

> +
> + /* If hash-key is addr or addr-port then we need the address, but if we
> +  * can't determine the address then we fall back on hashing the puid.
> +  */
> + switch (s->hash_key) {

Here I initially got confused by the use of s->hash_key while the rest
below uses hash_key, I'd change it to hash_key as well since it's already
assigned at the top.

> + case SRV_HASH_KEY_ADDR:
> + case SRV_HASH_KEY_ADDR_PORT:
> + server_get_inetaddr(s, &srv_addr);
> + if (srv_addr.family != AF_INET && srv_addr.family != AF_INET6) {
> + hash_key = SRV_HASH_KEY_ID;
> + }
> + break;
> +
> + default:
> + break;
> + }
> +
> + if (hash_key == SRV_HASH_KEY_ADDR_PORT) {
> + key = full_hash(srv_addr.port.svc);
> + }

Given that "key" is used immediately below and that I had to scroll up a
bit to verify that it was properly initialized, I'd rather set the default
zero value here in an else statement for ease of inspection later.

> + switch (hash_key) {
> + case SRV_HASH_KEY_ADDR_PORT:
> + case SRV_HASH_KEY_ADDR:
> + switch (srv_addr.family) {
> + case AF_INET:
> + key = full_hash(key + srv_addr.addr.v4.s_addr);
> + break;
> + case AF_INET6:
> + key = XXH32(srv_addr.addr.v6.s6_addr, 16, key);
> + break;
> + default:
(...)

> +  Arguments :
> +may be one of the following :
> +
> +  id The node keys will be derived from the server's position in
> + the server list.
> +

Here it's "the server's numeric identifier as set from 'id' or which
defaults to its position in the list".

The rest is good.

If you want I can perform these tiny cosmetic adjustments myself so as
to save you a round trip.

Thanks!
Willy



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-27 Thread Willy Tarreau
Hi Anthony,

On Sun, Mar 24, 2024 at 10:11:41PM -0400, Anthony Deschamps wrote:
> Hi Willy,
> 
> I'm just checking in to see if there's anything left I can help address here.

Thanks for the ping and sorry for the delay. It just fell through the
cracks in the middle of all other stuff I'm currently having to deal
with in parallel, as every time we're starting to get closer to a
release :-(

I'll try to deal with it today.

Willy



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-24 Thread Anthony Deschamps
Hi Willy,

I'm just checking in to see if there's anything left I can help address here.

Thank you,
Anthony

On Wed, Mar 13, 2024 at 4:59 PM Willy Tarreau  wrote:
>
> Hi Anthony,
>
> On Wed, Mar 13, 2024 at 12:33:54PM -0400, Anthony Deschamps wrote:
> > Hi Willy,
> >
> > My original concern was that if two servers had values of lb_server_key
> > that are close to each other, then there could be some overlap in their
> > values of lb_server_key + node_index;, which is why I was looking for a
> > reasonable hash function to combine them.
>
> I know but you can't really avoid it anyway, and that's why we place
> many nodes. I tried an attempt in the past where nodes were evenly
> spaced by dividing by the number of entries etc. It was a disaster.
> As soon as a server went down, the next one would take its extra load
> for example. So anything you try to do to avoid some corner cases create
> other ones, while overall, a random distribution (like a hash does)
> gives great results.
>
> > I based it on boost::hash_combine,
> > but since writing that I came across https://stackoverflow.com/a/50978188
> > explaining why boost::hash_combine isn't necessarily a good hash function.
>
> Hehe. We've all written our own hash functions for a given purpose at one
> point in time, and they're usually fine inside their context and bad once
> used outside. That's also where XXH and such stuff shines, it brings
> generic functions that are good all over the space. In our case, the
> full_hash() is just an avalanche non-linear 1:1 mapping between the input
> and the output which suffices to avoid resonance effects that come from
> repeated additions, it's perfectly suited here I think.
>
> > I agree that it's not too critical to have a few collisions, but I'm 
> > reaching
> > the limits of my knowledge here. Does this change address your concerns?
> > I've attached an updated patch, and here's the diff between them:
> >
> > diff --git a/src/lb_chash.c b/src/lb_chash.c
> > index c77222854..d7a1b2539 100644
> > --- a/src/lb_chash.c
> > +++ b/src/lb_chash.c
> > @@ -66,9 +66,7 @@ static inline void chash_dequeue_srv(struct server *s)
> >   */
> >  static inline u32 chash_compute_node_key(struct server *s, unsigned 
> > node_index)
> >  {
> > -u32 server_key = s->lb_server_key;
> > -u32 index_key = full_hash(node_index);
> > -return server_key ^= index_key + 0x9e3779b9 + (server_key << 6) +
> > (server_key >> 2);
> > +return full_hash(s->lb_server_key + node_index);
> >  }
> >
> >  /* Compute the base server key that will be used to compute node keys. 
> > Servers
> > @@ -112,8 +110,7 @@ static inline u32 chash_compute_server_key(struct 
> > server *s)
> >  case SRV_HASH_KEY_ADDR:
> >  switch (srv_addr.family) {
> >  case AF_INET:
> > -u32 addr_key = full_hash(srv_addr.addr.v4.s_addr);
> > -key ^= addr_key + 0x9e3779b9 + (key << 6) + (key >> 2);
> > +key = full_hash(key + srv_addr.addr.v4.s_addr);
> >  break;
> >  case AF_INET6:
> >  key = XXH32(srv_addr.addr.v6.s6_addr, 16, key);
>
> Yeay I think it addresses everything. I'll re-test your updated patch
> tomorrow hoping that this time I'll merge it :-)
>
> Thanks for your patience!
> Willy



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-13 Thread Willy Tarreau
Hi Anthony,

On Wed, Mar 13, 2024 at 12:33:54PM -0400, Anthony Deschamps wrote:
> Hi Willy,
> 
> My original concern was that if two servers had values of lb_server_key
> that are close to each other, then there could be some overlap in their
> values of lb_server_key + node_index;, which is why I was looking for a
> reasonable hash function to combine them.

I know but you can't really avoid it anyway, and that's why we place
many nodes. I tried an attempt in the past where nodes were evenly
spaced by dividing by the number of entries etc. It was a disaster.
As soon as a server went down, the next one would take its extra load
for example. So anything you try to do to avoid some corner cases create
other ones, while overall, a random distribution (like a hash does)
gives great results.

> I based it on boost::hash_combine,
> but since writing that I came across https://stackoverflow.com/a/50978188
> explaining why boost::hash_combine isn't necessarily a good hash function.

Hehe. We've all written our own hash functions for a given purpose at one
point in time, and they're usually fine inside their context and bad once
used outside. That's also where XXH and such stuff shines, it brings
generic functions that are good all over the space. In our case, the
full_hash() is just an avalanche non-linear 1:1 mapping between the input
and the output which suffices to avoid resonance effects that come from
repeated additions, it's perfectly suited here I think.

> I agree that it's not too critical to have a few collisions, but I'm reaching
> the limits of my knowledge here. Does this change address your concerns?
> I've attached an updated patch, and here's the diff between them:
> 
> diff --git a/src/lb_chash.c b/src/lb_chash.c
> index c77222854..d7a1b2539 100644
> --- a/src/lb_chash.c
> +++ b/src/lb_chash.c
> @@ -66,9 +66,7 @@ static inline void chash_dequeue_srv(struct server *s)
>   */
>  static inline u32 chash_compute_node_key(struct server *s, unsigned 
> node_index)
>  {
> -u32 server_key = s->lb_server_key;
> -u32 index_key = full_hash(node_index);
> -return server_key ^= index_key + 0x9e3779b9 + (server_key << 6) +
> (server_key >> 2);
> +return full_hash(s->lb_server_key + node_index);
>  }
> 
>  /* Compute the base server key that will be used to compute node keys. 
> Servers
> @@ -112,8 +110,7 @@ static inline u32 chash_compute_server_key(struct server 
> *s)
>  case SRV_HASH_KEY_ADDR:
>  switch (srv_addr.family) {
>  case AF_INET:
> -u32 addr_key = full_hash(srv_addr.addr.v4.s_addr);
> -key ^= addr_key + 0x9e3779b9 + (key << 6) + (key >> 2);
> +key = full_hash(key + srv_addr.addr.v4.s_addr);
>  break;
>  case AF_INET6:
>  key = XXH32(srv_addr.addr.v6.s6_addr, 16, key);

Yeay I think it addresses everything. I'll re-test your updated patch
tomorrow hoping that this time I'll merge it :-)

Thanks for your patience!
Willy



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-13 Thread Anthony Deschamps
Hi Willy,

My original concern was that if two servers had values of lb_server_key
that are close to each other, then there could be some overlap in their
values of lb_server_key + node_index;, which is why I was looking for a
reasonable hash function to combine them. I based it on boost::hash_combine,
but since writing that I came across https://stackoverflow.com/a/50978188
explaining why boost::hash_combine isn't necessarily a good hash function.

I agree that it's not too critical to have a few collisions, but I'm reaching
the limits of my knowledge here. Does this change address your concerns?
I've attached an updated patch, and here's the diff between them:

diff --git a/src/lb_chash.c b/src/lb_chash.c
index c77222854..d7a1b2539 100644
--- a/src/lb_chash.c
+++ b/src/lb_chash.c
@@ -66,9 +66,7 @@ static inline void chash_dequeue_srv(struct server *s)
  */
 static inline u32 chash_compute_node_key(struct server *s, unsigned node_index)
 {
-u32 server_key = s->lb_server_key;
-u32 index_key = full_hash(node_index);
-return server_key ^= index_key + 0x9e3779b9 + (server_key << 6) +
(server_key >> 2);
+return full_hash(s->lb_server_key + node_index);
 }

 /* Compute the base server key that will be used to compute node keys. Servers
@@ -112,8 +110,7 @@ static inline u32 chash_compute_server_key(struct server *s)
 case SRV_HASH_KEY_ADDR:
 switch (srv_addr.family) {
 case AF_INET:
-u32 addr_key = full_hash(srv_addr.addr.v4.s_addr);
-key ^= addr_key + 0x9e3779b9 + (key << 6) + (key >> 2);
+key = full_hash(key + srv_addr.addr.v4.s_addr);
 break;
 case AF_INET6:
 key = XXH32(srv_addr.addr.v6.s6_addr, 16, key);


Thanks,
Anthony
From 3280a39664bf913807200f21d1b952856966d141 Mon Sep 17 00:00:00 2001
From: Anthony Deschamps 
Date: Fri, 16 Feb 2024 16:00:35 -0500
Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server
 address

Motivation: When services are discovered through DNS resolution, the order in
which DNS records get resolved and assigned to servers is arbitrary. Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they have
assigned different IPs and ports to the server in that slot.

This patch adds a server option, "hash-key " which can be set to "id" (the
existing behaviour, defaut), "addr", or "addr-port". By deriving the keys for
the chash tree nodes from a server's address and port we ensure that independent
HAProxy instances will agree on routing decisions. If an address is not known
then the key is derived from the server's puid as it was previously.

When adjusting a server's weight, we now unconditionally remove all nodes first,
since the server's address (and therefore the desired node keys) may have changed.
---
 doc/configuration.txt  | 24 ++
 include/haproxy/server-t.h |  9 
 src/lb_chash.c | 90 +++---
 src/server.c   | 31 +
 4 files changed, 148 insertions(+), 6 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 1065e6098..052d41f9f 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -4813,6 +4813,8 @@ error-log-format  X  X X -
 force-persist -  - X X
 filter-  X X X
 fullconn  X  - X X
+hash-balance-factor   X  - X X
+hash-key  X  - X X
 hash-type X  - X X
 http-after-response   X (!)  X X X
 http-check commentX  - X X
@@ -6606,6 +6608,28 @@ hash-balance-factor 
   See also : "balance" and "hash-type".
 
 
+hash-key 
+  Specify how "hash-type consistent" node keys are computed
+
+  Arguments :
+may be one of the following :
+
+  id The node keys will be derived from the server's position in
+ the server list.
+
+  addr   The node keys will be derived from the server's address, when
+ available, or else fall back on "id".
+
+  addr-port  The node keys will be derived from the server's address and
+ port, when available, or else fall back on "id".
+
+  The "addr" and "addr-port" options may be useful in scenarios where multiple
+  HAProxy processes are balancing traffic to the same set of servers. If the
+  ser

Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-13 Thread Willy Tarreau
Hi Anthony,

On Tue, Mar 12, 2024 at 07:10:42PM -0400, Anthony Deschamps wrote:
> Hi Willy,
> 
> Thanks for the feedback. I had been testing with smaller numbers of
> servers (usually between 4 and 32) so I hadn't noticed the performance
> impact.

Not surprised. I'm used to testing with some extreme setups for knowing
such exist in field (more than 1000 servers in a single backend were
already reported for example).

> Here's an updated patch (as an attachment, until I figure out how
> to make sure things are formatted correctly!) that should address that.

Thanks. It's not a problem attached this way because my mailer still
recognizes it as text and inlines it when responding, so that doesn't
alter my ability to comment.

> I added a "lb_server_key" to the server struct. I moved most of the hashing
> logic that I previously had in "chash_compute_node_key" into a new function,
> "chash_compute_server_key", and I store that value; nodes only need to be
> removed and reinserted if this value changes. To compute a node key, I just
> combine the server key with a hash of node_index. I considered using
> XXH32_round to do this, but that looks like it's intended to be an internal
> function, so I inlined something similar instead.

You could always decide to combine an existing hash with another value, or
combine one 32-bit key with a 32-bit index and apply a 64-bit hash, but I
think that's fine this way.

> Hopefully this all works. Thanks for taking the time to work through this, and
> please let me know if there's anything else to address.

I just got this:

  src/lb_chash.c: In function 'chash_compute_server_key':
  src/lb_chash.c:115:4: error: a label can only be part of a statement and a 
declaration is not a statement
  compilation terminated due to -Wfatal-errors.

That's because of this:

> + switch (hash_key) {
> + case SRV_HASH_KEY_ADDR_PORT:
> + case SRV_HASH_KEY_ADDR:
> + switch (srv_addr.family) {
> + case AF_INET:
> + u32 addr_key = full_hash(srv_addr.addr.v4.s_addr);

Here that's a declaration after a statement ("case" not being considered
as the beginning of a block since you can pretty well fall through from
a previous one).

I'll adjust it while merging.

I had a concern about the hashing function which seems to reduce the
input key space, and indeed it does, it reduces 4294967296 server keys
to only 2948667289 (68.65 %):

> + key ^= addr_key + 0x9e3779b9 + (key << 6) + (key >> 2);

Not critical but it would be better if we can avoid this loss. I tried
with the full_hash() function and it preserves all output keys. So I
think you should instead use:

key = full_hash(add_key);

and chash_compute_node_key() could then simply be achieved this way:

  static inline u32 chash_compute_node_key(struct server *s, unsigned 
node_index)
  {
   return full_hash(s->lb_server_key + node_index);
  }

Other than that, I'm OK with the rest.

Thanks!
Willy



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-12 Thread Anthony Deschamps
Hi Willy,

Thanks for the feedback. I had been testing with smaller numbers of
servers (usually between 4 and 32) so I hadn't noticed the performance
impact. Here's an updated patch (as an attachment, until I figure out how
to make sure things are formatted correctly!) that should address that.

I added a "lb_server_key" to the server struct. I moved most of the hashing
logic that I previously had in "chash_compute_node_key" into a new function,
"chash_compute_server_key", and I store that value; nodes only need to be
removed and reinserted if this value changes. To compute a node key, I just
combine the server key with a hash of node_index. I considered using
XXH32_round to do this, but that looks like it's intended to be an internal
function, so I inlined something similar instead.

Hopefully this all works. Thanks for taking the time to work through this, and
please let me know if there's anything else to address.

Thanks,
Anthony
From 82acc8ff344abde640941d15fa5217b0dfe9701a Mon Sep 17 00:00:00 2001
From: Anthony Deschamps 
Date: Fri, 16 Feb 2024 16:00:35 -0500
Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server
 address

Motivation: When services are discovered through DNS resolution, the order in
which DNS records get resolved and assigned to servers is arbitrary. Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they have
assigned different IPs and ports to the server in that slot.

This patch adds a server option, "hash-key " which can be set to "id" (the
existing behaviour, defaut), "addr", or "addr-port". By deriving the keys for
the chash tree nodes from a server's address and port we ensure that independent
HAProxy instances will agree on routing decisions. If an address is not known
then the key is derived from the server's puid as it was previously.

When adjusting a server's weight, we now unconditionally remove all nodes first,
since the server's address (and therefore the desired node keys) may have changed.
---
 doc/configuration.txt  | 24 ++
 include/haproxy/server-t.h |  9 
 src/lb_chash.c | 93 +++---
 src/server.c   | 31 +
 4 files changed, 151 insertions(+), 6 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 1065e6098..052d41f9f 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -4813,6 +4813,8 @@ error-log-format  X  X X -
 force-persist -  - X X
 filter-  X X X
 fullconn  X  - X X
+hash-balance-factor   X  - X X
+hash-key  X  - X X
 hash-type X  - X X
 http-after-response   X (!)  X X X
 http-check commentX  - X X
@@ -6606,6 +6608,28 @@ hash-balance-factor 
   See also : "balance" and "hash-type".
 
 
+hash-key 
+  Specify how "hash-type consistent" node keys are computed
+
+  Arguments :
+may be one of the following :
+
+  id The node keys will be derived from the server's position in
+ the server list.
+
+  addr   The node keys will be derived from the server's address, when
+ available, or else fall back on "id".
+
+  addr-port  The node keys will be derived from the server's address and
+ port, when available, or else fall back on "id".
+
+  The "addr" and "addr-port" options may be useful in scenarios where multiple
+  HAProxy processes are balancing traffic to the same set of servers. If the
+  server order of each process is different (because, for example, DNS records
+  were resolved in different orders) then this will allow each independent
+  HAProxy processes to agree on routing decisions.
+
+
 hash-type   
   Specify a method to use for mapping hashes to servers
 
diff --git a/include/haproxy/server-t.h b/include/haproxy/server-t.h
index f077ff2f8..811eae712 100644
--- a/include/haproxy/server-t.h
+++ b/include/haproxy/server-t.h
@@ -223,6 +223,13 @@ struct pid_list {
 	int exited;
 };
 
+/* srv methods of computing chash keys */
+enum srv_hash_key {
+	SRV_HASH_KEY_ID = 0, /* derived from server puid */
+	SRV_HASH_KEY_ADDR,   /* derived from server address */
+	SRV_HASH_KEY_ADDR_PORT   /* derived from server address and port */
+};
+
 /* A tree occurrence is a descriptor of 

Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-12 Thread Willy Tarreau
Hi again Anthony,

First comment, thanks for the patch, it's of very good quality!
I'm having *one* problem with it, below:

>  /* Adjust the number of entries of a server in its tree. The server must 
> appear
>   * as many times as its weight indicates it. If it's there too often, we 
> remove
>   * the last occurrences. If it's not there enough, we add more occurrences. 
> To
> @@ -67,39 +126,32 @@ static inline void chash_dequeue_srv(struct server *s)
>   */
>  static inline void chash_queue_dequeue_srv(struct server *s)
>  {
> - while (s->lb_nodes_now > s->next_eweight) {
> - if (s->lb_nodes_now >= s->lb_nodes_tot) // should always be 
> false anyway
> - s->lb_nodes_now = s->lb_nodes_tot;
> - s->lb_nodes_now--;
> - if (s->proxy->lbprm.chash.last == 
> &s->lb_nodes[s->lb_nodes_now].node)
> - s->proxy->lbprm.chash.last = 
> chash_skip_node(s->lb_tree, s->proxy->lbprm.chash.last);
> - eb32_delete(&s->lb_nodes[s->lb_nodes_now].node);
> - }
> + unsigned int j;
> +
> + /* First we need to remove all of the server's entries from its tree
> +  * because a) the realloc will change all node pointers and b) the keys
> +  * may change.
> +  */
> + chash_dequeue_srv(s);

This part above. It completely removes the server from the tree before
re-adding it for any weight change. That's extremely costly. One case
that stresses this part is the slowstart mechanism. If you set 500
servers in a backend like this:

server-template s 0-499 127.0.0.1:8000 slowstart 20s weight 256

then you turn all of them down then up, in -master the process runs at
roughly 10% CPU for 20 seconds for the time it takes to progressively
add all nodes to the tree, while with the new version, the load grows
over 20 seconds till it reaches 100% and sometimes triggers the watchdog.

IMHO the full dequeue should only be done in the previous conditions,
or if the key changes, as you mentionned in the comment.

This means that you should store in the server along with the hash_key
the one which corresponds to the indexed nodes (the one that was last
used to insert the server in the tree). I'd just imagine a prev_hash_key
that's synchronized after inserting the server. Then when entering
chash_queue_dequeue_srv(), it would be sufficient to compare the two
keys and decide to call chash_dequeue_srv() in case they don't match.
This way the full dequeue would be done only when the key changes and
not when the weight slightly increases with a same key.

The only other comment I could find is this tiny one, in
chash_compute_node_key():

+   case SRV_HASH_KEY_ADDR_PORT:
+   server_get_inetaddr(s, &srv_addr);
+   if (srv_addr.family != AF_INET && srv_addr.family != AF_INET6) {
+ hash_key = SRV_HASH_KEY_ID;
+   }

As you can see there's an indent issue before hash_key. I said it was
tiny ;-)

That's essentially all I could find, so I'm pretty confident that with
that addressed, the patch can be merged quickly!

Thank you!
Willy



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-11 Thread Willy Tarreau
On Mon, Mar 11, 2024 at 11:24:34PM -0400, Anthony Deschamps wrote:
> Sorry for the trouble! I'll have to sort out what's happening.

No problem, some mailers are well-known for mangling what looks like
text.

> Here it is as an attachment.

Looks good as-is, thank you! I'll review it (hopefully today) and merge
it if I'm fine with it.

Willy



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-11 Thread Anthony Deschamps
Sorry for the trouble! I'll have to sort out what's happening. Here it is
as an attachment.

Thanks!
Anthony


On Mon, Mar 11, 2024 at 11:06 PM Willy Tarreau  wrote:

> Hi Anthony,
>
> On Mon, Mar 11, 2024 at 08:58:17PM -0400, Anthony Deschamps wrote:
> > > I'm not sure the scripts will help me (at least :-)). I was thinking
> that
> > > a test could just be "set server XXX addr YYY" on the CLI to change the
> > > server's address and verify that the hashing follows the address not
> the
> > > server number.
> >
> > Oh, good point -- I wasn't testing what happens when changing the address
> > via the CLI. It turns out I wasn't handling that case. I added a call in
> > _srv_set_inetaddr_port() to update the hashes, and tested that it works.
> >
> > I also moved the docs and added an entry to the table.
> >
> > Here's the updated patch.
>
> Thanks!
>
> However there's still a problem with your mailer mangling the patch
> see below:
>
> > + /* If hash-key is addr or addr-port then we need the address, but if we
> > + * can't determine the address then we fall back on hashing the puid.
> > + */
> > + switch (s->hash_key) {
> > + case SRV_HASH_KEY_ADDR:
> > + case SRV_HASH_KEY_ADDR_PORT:
> > + server_get_inetaddr(s, &srv_addr);
> > + if (srv_addr.family != AF_INET && srv_addr.family != AF_INET6) {
> > +  hash_key = SRV_HASH_KEY_ID;
> > + }
> > + break;
> (...)
>
> Can you please resend it as an attachment instead ?
>
> Thanks!
> Willy
>
From 464504a3c8560a952602d6f28009727f29e7c289 Mon Sep 17 00:00:00 2001
From: Anthony Deschamps 
Date: Fri, 16 Feb 2024 16:00:35 -0500
Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server
 address

Motivation: When services are discovered through DNS resolution, the order in
which DNS records get resolved and assigned to servers is arbitrary. Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they have
assigned different IPs and ports to the server in that slot.

This patch adds a server option, "hash-key " which can be set to "id" (the
existing behaviour, defaut), "addr", or "addr-port". By deriving the keys for
the chash tree nodes from a server's address and port we ensure that independent
HAProxy instances will agree on routing decisions. If an address is not known
then the key is derived from the server's puid as it was previously.

When adjusting a server's weight, we now unconditionally remove all nodes first,
since the server's address (and therefore the desired node keys) may have changed.
---
 doc/configuration.txt  |  24 +
 include/haproxy/server-t.h |   8 +++
 src/lb_chash.c | 101 +++--
 src/server.c   |  31 
 4 files changed, 138 insertions(+), 26 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 1065e6098..052d41f9f 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -4813,6 +4813,8 @@ error-log-format  X  X X -
 force-persist -  - X X
 filter-  X X X
 fullconn  X  - X X
+hash-balance-factor   X  - X X
+hash-key  X  - X X
 hash-type X  - X X
 http-after-response   X (!)  X X X
 http-check commentX  - X X
@@ -6606,6 +6608,28 @@ hash-balance-factor 
   See also : "balance" and "hash-type".
 
 
+hash-key 
+  Specify how "hash-type consistent" node keys are computed
+
+  Arguments :
+may be one of the following :
+
+  id The node keys will be derived from the server's position in
+ the server list.
+
+  addr   The node keys will be derived from the server's address, when
+ available, or else fall back on "id".
+
+  addr-port  The node keys will be derived from the server's address and
+ port, when available, or else fall back on "id".
+
+  The "addr" and "addr-port" options may be useful in scenarios where multiple
+  HAProxy processes are balancing traffic to the same set of servers. If the
+  server order of each process is different (because, for example, DNS records
+  wer

Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-11 Thread Willy Tarreau
Hi Anthony,

On Mon, Mar 11, 2024 at 08:58:17PM -0400, Anthony Deschamps wrote:
> > I'm not sure the scripts will help me (at least :-)). I was thinking that
> > a test could just be "set server XXX addr YYY" on the CLI to change the
> > server's address and verify that the hashing follows the address not the
> > server number.
> 
> Oh, good point -- I wasn't testing what happens when changing the address
> via the CLI. It turns out I wasn't handling that case. I added a call in
> _srv_set_inetaddr_port() to update the hashes, and tested that it works.
> 
> I also moved the docs and added an entry to the table.
> 
> Here's the updated patch.

Thanks!

However there's still a problem with your mailer mangling the patch
see below:

> + /* If hash-key is addr or addr-port then we need the address, but if we
> + * can't determine the address then we fall back on hashing the puid.
> + */
> + switch (s->hash_key) {
> + case SRV_HASH_KEY_ADDR:
> + case SRV_HASH_KEY_ADDR_PORT:
> + server_get_inetaddr(s, &srv_addr);
> + if (srv_addr.family != AF_INET && srv_addr.family != AF_INET6) {
> +  hash_key = SRV_HASH_KEY_ID;
> + }
> + break;
(...)

Can you please resend it as an attachment instead ?

Thanks!
Willy



Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-11 Thread Anthony Deschamps
> I'm not sure the scripts will help me (at least :-)). I was thinking that
> a test could just be "set server XXX addr YYY" on the CLI to change the
> server's address and verify that the hashing follows the address not the
> server number.

Oh, good point -- I wasn't testing what happens when changing the address
via the CLI. It turns out I wasn't handling that case. I added a call in
_srv_set_inetaddr_port() to update the hashes, and tested that it works.

I also moved the docs and added an entry to the table.

Here's the updated patch.

Thanks!
Anthony

>From 3412e86c222ad7b231d687f1402fbe30de312974 Mon Sep 17 00:00:00 2001
From: Anthony Deschamps 
Date: Fri, 16 Feb 2024 16:00:35 -0500
Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server
 address

Motivation: When services are discovered through DNS resolution, the order
in
which DNS records get resolved and assigned to servers is arbitrary.
Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they
have
assigned different IPs and ports to the server in that slot.

This patch adds a server option, "hash-key " which can be set to "id"
(the
existing behaviour, defaut), "addr", or "addr-port". By deriving the keys
for
the chash tree nodes from a server's address and port we ensure that
independent
HAProxy instances will agree on routing decisions. If an address is not
known
then the key is derived from the server's puid as it was previously.

When adjusting a server's weight, we now unconditionally remove all nodes
first,
since the server's address (and therefore the desired node keys) may have
changed.
---
 doc/configuration.txt  |  24 +
 include/haproxy/server-t.h |   8 +++
 src/lb_chash.c | 101 +++--
 src/server.c   |  31 
 4 files changed, 138 insertions(+), 26 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 1065e6098..052d41f9f 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -4813,6 +4813,8 @@ error-log-format  X
 X X -
 force-persist -  - X X
 filter-  X X X
 fullconn  X  - X X
+hash-balance-factor   X  - X X
+hash-key  X  - X X
 hash-type X  - X X
 http-after-response   X (!)  X X X
 http-check commentX  - X X
@@ -6606,6 +6608,28 @@ hash-balance-factor 
   See also : "balance" and "hash-type".


+hash-key 
+  Specify how "hash-type consistent" node keys are computed
+
+  Arguments :
+may be one of the following :
+
+  id The node keys will be derived from the server's position
in
+ the server list.
+
+  addr   The node keys will be derived from the server's address,
when
+ available, or else fall back on "id".
+
+  addr-port  The node keys will be derived from the server's address
and
+ port, when available, or else fall back on "id".
+
+  The "addr" and "addr-port" options may be useful in scenarios where
multiple
+  HAProxy processes are balancing traffic to the same set of servers. If
the
+  server order of each process is different (because, for example, DNS
records
+  were resolved in different orders) then this will allow each independent
+  HAProxy processes to agree on routing decisions.
+
+
 hash-type   
   Specify a method to use for mapping hashes to servers

diff --git a/include/haproxy/server-t.h b/include/haproxy/server-t.h
index f077ff2f8..1a2735c2a 100644
--- a/include/haproxy/server-t.h
+++ b/include/haproxy/server-t.h
@@ -223,6 +223,13 @@ struct pid_list {
  int exited;
 };

+/* srv methods of computing chash keys */
+enum srv_hash_key {
+ SRV_HASH_KEY_ID = 0, /* derived from server puid */
+ SRV_HASH_KEY_ADDR,   /* derived from server address */
+ SRV_HASH_KEY_ADDR_PORT   /* derived from server address and port */
+};
+
 /* A tree occurrence is a descriptor of a place in a tree, with a pointer
back
  * to the server itself.
  */
@@ -366,6 +373,7 @@ struct server {
  struct tree_occ *lb_nodes;  /* lb_nodes_tot * struct tree_occ
*/
  unsigned lb_nodes_tot;  /* number of allocated lb_nodes
(C-HASH) */
  unsigned lb_nodes_now;  /* number of lb_nodes placed in
the tree (C-HASH) */
+ enum srv_hash_key

Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-03-09 Thread Willy Tarreau
Hi Anthony,

On Wed, Feb 21, 2024 at 11:49:45AM -0500, Anthony Deschamps wrote:
> Hi Willy, thanks for the thoughtful feedback.
> 
> Here's a new patch that makes this configurable via a "hash-key" server
> argument, which defaults to "id" as you suggested.

Thanks.

> I'm struggling to test the last case you described. A quick description of
> my test setup: I have multiple instances of a simple echo server running
> on a range of ports, which I spawn/kill manually. I then have a local consul
> agent providing DNS, and multiple proxy instances running, and my actual
> test is a short Python script that makes a series of requests to each of the
> proxies and checks whether they were all routed to the same echo server
> (they identify themselves by setting a header in the response). I can share
> all the test scripts/config if it's helpful.

I'm not sure the scripts will help me (at least :-)). I was thinking that
a test could just be "set server XXX addr YYY" on the CLI to change the
server's address and verify that the hashing follows the address not the
server number.

> When I configure my proxy with long health checks -- by setting
>  "server-template ... check inter 6" -- I still find that when I
> kill/spawn an
> instance of the mock service, a server will first go down when an entry is
> removed from the SRV record, and then get rehashed when a new SRV
> entry is assigned to the server. Is there a different sequence of events I
> should be testing?

Well, if it does that and your server continues to get the traffic it
should get, then that's fine.

> Here is the updated patch.

Thank you. I'm seeing that your mailer mangled it (see below). Better
try again attaching it. Another detail, it looks like the doc entry is
located inside the "bind keywords" section, while it should be in section
4.1 "proxy keyword matrix", located between "hash-balance-factor" and
"hash-type". Please also make sure to add one entry in the keywords
table at the beginning of the section.

Oh and really do not hesitate to ping me again if I don't respond within
7 days, because it then becomes very hard for me to get back to a previous
discussion. I just remembered there was something to look for on the list,
but I could easily have lost it :-/

Thanks!
Willy

> >From 7d8a63803017c9b7630ec5cb3a154778fdff4d86 Mon Sep 17 00:00:00 2001
> From: Anthony Deschamps 
> Date: Fri, 16 Feb 2024 16:00:35 -0500
> Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server
>  address
> 
> Motivation: When services are discovered through DNS resolution, the order
> in
> which DNS records get resolved and assigned to servers is arbitrary.
> Therefore,
> even though two HAProxy instances using chash balancing might agree that a
> particular request should go to server3, it is likely the case that they
> have
> assigned different IPs and ports to the server in that slot.
> 
> This patch adds a server option, "hash-key " which can be set to "id"
> (the
> existing behaviour, defaut), "addr", or "addr-port". By deriving the keys
> for
> the chash tree nodes from a server's address and port we ensure that
> independent
> HAProxy instances will agree on routing decisions. If an address is not
> known
> then the key is derived from the server's puid as it was previously.
> 
> When adjusting a server's weight, we now unconditionally remove all nodes
> first,
> since the server's address (and therefore the desired node keys) may have
> changed.
> ---
>  doc/configuration.txt  | 21 
>  include/haproxy/server-t.h |  8 
>  src/lb_chash.c | 98 +-
>  src/server.c   | 28 +++
>  4 files changed, 132 insertions(+), 23 deletions(-)
> 
> diff --git a/doc/configuration.txt b/doc/configuration.txt
> index 1065e6098..a0710395a 100644
> --- a/doc/configuration.txt
> +++ b/doc/configuration.txt
> @@ -15763,6 +15763,27 @@ group 
>"gid" setting except that the group name is used instead of its gid. This
>setting is ignored by non UNIX sockets.
> 
> +hash-key 
> +  Specify how "hash-type consistent" node keys are computed
> +
> +  Arguments :
> +may be one of the following :
> +
> +  id The node keys will be derived from the server's position
> in
> + the server list.
> +
> +  addr   The node keys will be derived from the server's address,
> when
> + available, or else fall back on "id".
> +
> +  addr-port  The node keys will be derived from th

Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-02-21 Thread Anthony Deschamps
Hi Willy, thanks for the thoughtful feedback.

Here's a new patch that makes this configurable via a "hash-key" server
argument, which defaults to "id" as you suggested.

I'm struggling to test the last case you described. A quick description of
my test setup: I have multiple instances of a simple echo server running
on a range of ports, which I spawn/kill manually. I then have a local consul
agent providing DNS, and multiple proxy instances running, and my actual
test is a short Python script that makes a series of requests to each of the
proxies and checks whether they were all routed to the same echo server
(they identify themselves by setting a header in the response). I can share
all the test scripts/config if it's helpful.

When I configure my proxy with long health checks -- by setting
 "server-template ... check inter 6" -- I still find that when I
kill/spawn an
instance of the mock service, a server will first go down when an entry is
removed from the SRV record, and then get rehashed when a new SRV
entry is assigned to the server. Is there a different sequence of events I
should be testing?

Here is the updated patch.

Thanks,
Anthony

>From 7d8a63803017c9b7630ec5cb3a154778fdff4d86 Mon Sep 17 00:00:00 2001
From: Anthony Deschamps 
Date: Fri, 16 Feb 2024 16:00:35 -0500
Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server
 address

Motivation: When services are discovered through DNS resolution, the order
in
which DNS records get resolved and assigned to servers is arbitrary.
Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they
have
assigned different IPs and ports to the server in that slot.

This patch adds a server option, "hash-key " which can be set to "id"
(the
existing behaviour, defaut), "addr", or "addr-port". By deriving the keys
for
the chash tree nodes from a server's address and port we ensure that
independent
HAProxy instances will agree on routing decisions. If an address is not
known
then the key is derived from the server's puid as it was previously.

When adjusting a server's weight, we now unconditionally remove all nodes
first,
since the server's address (and therefore the desired node keys) may have
changed.
---
 doc/configuration.txt  | 21 
 include/haproxy/server-t.h |  8 
 src/lb_chash.c | 98 +-
 src/server.c   | 28 +++
 4 files changed, 132 insertions(+), 23 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 1065e6098..a0710395a 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -15763,6 +15763,27 @@ group 
   "gid" setting except that the group name is used instead of its gid. This
   setting is ignored by non UNIX sockets.

+hash-key 
+  Specify how "hash-type consistent" node keys are computed
+
+  Arguments :
+may be one of the following :
+
+  id The node keys will be derived from the server's position
in
+ the server list.
+
+  addr   The node keys will be derived from the server's address,
when
+ available, or else fall back on "id".
+
+  addr-port  The node keys will be derived from the server's address
and
+ port, when available, or else fall back on "id".
+
+  The "addr" and "addr-port" options may be useful in scenarios where
multiple
+  HAProxy processes are balancing traffic to the same set of servers. If
the
+  server order of each process is different (because, for example, DNS
records
+  were resolved in different orders) then this will allow each independent
+  HAProxy processes to agree on routing decisions.
+
 id 
   Fixes the socket ID. By default, socket IDs are automatically assigned,
but
   sometimes it is more convenient to fix them to ease monitoring. This
value
diff --git a/include/haproxy/server-t.h b/include/haproxy/server-t.h
index f077ff2f8..1a2735c2a 100644
--- a/include/haproxy/server-t.h
+++ b/include/haproxy/server-t.h
@@ -223,6 +223,13 @@ struct pid_list {
  int exited;
 };

+/* srv methods of computing chash keys */
+enum srv_hash_key {
+ SRV_HASH_KEY_ID = 0, /* derived from server puid */
+ SRV_HASH_KEY_ADDR,   /* derived from server address */
+ SRV_HASH_KEY_ADDR_PORT   /* derived from server address and port */
+};
+
 /* A tree occurrence is a descriptor of a place in a tree, with a pointer
back
  * to the server itself.
  */
@@ -366,6 +373,7 @@ struct server {
  struct tree_occ *lb_nodes;  /* lb_nodes_tot * struct tree_occ
*/
  unsigned lb_nodes_tot;  /* number of allocated lb_nodes
(C-HASH) */
  unsigned lb_nodes_now;  /* number of lb_nodes placed in
the

Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-02-20 Thread Willy Tarreau
On Fri, Feb 16, 2024 at 05:03:56PM -0500, Anthony Deschamps wrote:
> >From a031cf97da759eb2c2f9b6e191065ad503f821ed Mon Sep 17 00:00:00 2001
> From: Anthony Deschamps 
> Date: Fri, 16 Feb 2024 16:00:35 -0500
> Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server
>  address
> 
> Motivation: When services are discovered through DNS resolution, the order
> in
> which DNS records get resolved and assigned to servers is arbitrary.
> Therefore,
> even though two HAProxy instances using chash balancing might agree that a
> particular request should go to server3, it is likely the case that they
> have
> assigned different IPs and ports to the server in that slot.
> 
> By deriving the keys for the chash tree nodes from a server's address and
> port
> we ensure that independent HAProxy instances will agree on routing
> decisions. If
> an address is not known then the key is derived from the server's puid as
> it was
> previously.
> 
> When adjusting a server's weight, we now unconditionally remove all nodes
> first,
> since the server's address (and therefore the desired node keys) may have
> changed.

That's a good idea, it corresponds more or less to what is already
supported for stick-tables where we can decide to synchronize the servers'
addresses instead of IDs.

However here you have replaced the existing method instead of offering
an option to choose this one, and that's not correct as it will break
a number of setups, for example those with multiple DCs and replicated
servers, or those with dual-attached servers (each connected to one LB),
or those who want to have all ports going to the same server, etc.

I think that your use case would be desirable for a majority of users,
so that's definitely something we should do, but making it configurable.

Since here the hash key is a property of the server, it would make sense
to add a "hash-key" argument to the server and support "id" (the default),
"addr" (IP only, useful when learned over DNS), "addr-port" for even more
mixing, and maybe in the future even "fqdn" or "name" depending on what
users ask for.

Another point to take care of is what happens when a server's address is
changed (DNS, CLI). Normally here you'll need to unhash / rehash the node.
In the current version it would become incorrect if the server doesn't
first go down then up (this can happen for example if health checks are
too far apart and give enough time for the server's address to be
reassigned).

Thanks!
Willy



[PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

2024-02-16 Thread Anthony Deschamps
>From a031cf97da759eb2c2f9b6e191065ad503f821ed Mon Sep 17 00:00:00 2001
From: Anthony Deschamps 
Date: Fri, 16 Feb 2024 16:00:35 -0500
Subject: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server
 address

Motivation: When services are discovered through DNS resolution, the order
in
which DNS records get resolved and assigned to servers is arbitrary.
Therefore,
even though two HAProxy instances using chash balancing might agree that a
particular request should go to server3, it is likely the case that they
have
assigned different IPs and ports to the server in that slot.

By deriving the keys for the chash tree nodes from a server's address and
port
we ensure that independent HAProxy instances will agree on routing
decisions. If
an address is not known then the key is derived from the server's puid as
it was
previously.

When adjusting a server's weight, we now unconditionally remove all nodes
first,
since the server's address (and therefore the desired node keys) may have
changed.
---
 src/lb_chash.c | 71 ++
 1 file changed, 48 insertions(+), 23 deletions(-)

diff --git a/src/lb_chash.c b/src/lb_chash.c
index 1c8034d89..bd39a07f8 100644
--- a/src/lb_chash.c
+++ b/src/lb_chash.c
@@ -21,8 +21,9 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
+#include 

 /* Return next tree node after  which must still be in the tree, or
be
  * NULL. Lookup wraps around the end to the beginning. If the next node is
the
@@ -58,6 +59,37 @@ static inline void chash_dequeue_srv(struct server *s)
  }
 }

+/* Compute a key that can be used to insert a node into the CHASH tree. If
the
+ * server has a known address then the key will be determined from its
address
+ * and port. This means that independent HAProxy instances will agree on
routing
+ * decisions. If an address is not known then the key will be determined
from
+ * the server's puid.
+ */
+static inline u32 chash_compute_node_key(struct server *s, unsigned
node_index)
+{
+ u32 key;
+ struct server_inetaddr srv_addr;
+
+ server_get_inetaddr(s, &srv_addr);
+
+ switch (srv_addr.family) {
+ case AF_INET:
+ key = full_hash(srv_addr.addr.v4.s_addr);
+ key ^= full_hash(srv_addr.port.svc);
+ key ^= full_hash(node_index);
+ break;
+ case AF_INET6:
+ unsigned seed = (srv_addr.port.svc << 16) + node_index;
+ key = XXH32(srv_addr.addr.v6.s6_addr, 16, seed);
+ break;
+ default:
+ key = full_hash(s->puid * SRV_EWGHT_RANGE + node_index);
+ break;
+ }
+
+ return key;
+}
+
 /* Adjust the number of entries of a server in its tree. The server must
appear
  * as many times as its weight indicates it. If it's there too often, we
remove
  * the last occurrences. If it's not there enough, we add more
occurrences. To
@@ -67,39 +99,32 @@ static inline void chash_dequeue_srv(struct server *s)
  */
 static inline void chash_queue_dequeue_srv(struct server *s)
 {
- while (s->lb_nodes_now > s->next_eweight) {
- if (s->lb_nodes_now >= s->lb_nodes_tot) // should always be false anyway
- s->lb_nodes_now = s->lb_nodes_tot;
- s->lb_nodes_now--;
- if (s->proxy->lbprm.chash.last == &s->lb_nodes[s->lb_nodes_now].node)
- s->proxy->lbprm.chash.last = chash_skip_node(s->lb_tree,
s->proxy->lbprm.chash.last);
- eb32_delete(&s->lb_nodes[s->lb_nodes_now].node);
- }
+ unsigned int j;
+
+ /* First we need to remove all of the server's entries from its tree
+ * because a) the realloc will change all node pointers and b) the keys
+ * may change.
+ */
+ chash_dequeue_srv(s);

  /* Attempt to increase the total number of nodes, if the user
  * increased the weight beyond the original weight
  */
  if (s->lb_nodes_tot < s->next_eweight) {
  struct tree_occ *new_nodes;
-
- /* First we need to remove all server's entries from its tree
- * because the realloc will change all nodes pointers */
- chash_dequeue_srv(s);
-
  new_nodes = realloc(s->lb_nodes, s->next_eweight * sizeof(*new_nodes));
  if (new_nodes) {
- unsigned int j;
-
  s->lb_nodes = new_nodes;
- memset(&s->lb_nodes[s->lb_nodes_tot], 0,
-(s->next_eweight - s->lb_nodes_tot) * sizeof(*s->lb_nodes));
- for (j = s->lb_nodes_tot; j < s->next_eweight; j++) {
- s->lb_nodes[j].server = s;
- s->lb_nodes[j].node.key = full_hash(s->puid * SRV_EWGHT_RANGE + j);
- }
  s->lb_nodes_tot = s->next_eweight;
  }
  }
+
+ memset(s->lb_nodes, 0, s->next_eweight * sizeof(*s->lb_nodes));
+ for (j = 0; j < s->next_eweight; j++) {
+ s->lb_nodes[j].server = s;
+ s->lb_nodes[j].node.key = chash_compute_node_key(s, j);
+ }
+
  while (s->lb_nodes_now < s->next_eweight) {
  if (s->lb_nodes_now >= s->lb_nodes_tot) // should always be false anyway
  break;
@@ -507,7 +532,7 @@ int chash_init_server_tree(struct proxy *p)
  }
  for (node = 0; node < srv->lb_nodes_tot; node++) {
  srv->lb_nodes[node].server = srv;
- srv->lb_nodes[node].node.key = full_hash(srv->puid * SRV_EWGHT_RANGE +
node);
+ srv->lb_nodes[node].node.key = chash_compute_node_key(srv, node);
  }

  if (srv_currently_usable(srv))
-- 
2.43.2