Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

Willy Tarreau Tue, 12 Mar 2024 00:43:22 -0700

Hi again Anthony,

First comment, thanks for the patch, it's of very good quality!
I'm having *one* problem with it, below:


>  /* Adjust the number of entries of a server in its tree. The server must 
> appear
>   * as many times as its weight indicates it. If it's there too often, we 
> remove
>   * the last occurrences. If it's not there enough, we add more occurrences. 
> To
> @@ -67,39 +126,32 @@ static inline void chash_dequeue_srv(struct server *s)
>   */
>  static inline void chash_queue_dequeue_srv(struct server *s)
>  {
> -     while (s->lb_nodes_now > s->next_eweight) {
> -             if (s->lb_nodes_now >= s->lb_nodes_tot) // should always be 
> false anyway
> -                     s->lb_nodes_now = s->lb_nodes_tot;
> -             s->lb_nodes_now--;
> -             if (s->proxy->lbprm.chash.last == 
> &s->lb_nodes[s->lb_nodes_now].node)
> -                     s->proxy->lbprm.chash.last = 
> chash_skip_node(s->lb_tree, s->proxy->lbprm.chash.last);
> -             eb32_delete(&s->lb_nodes[s->lb_nodes_now].node);
> -     }
> +     unsigned int j;
> +
> +     /* First we need to remove all of the server's entries from its tree
> +      * because a) the realloc will change all node pointers and b) the keys
> +      * may change.
> +      */
> +     chash_dequeue_srv(s);

This part above. It completely removes the server from the tree before
re-adding it for any weight change. That's extremely costly. One case
that stresses this part is the slowstart mechanism. If you set 500
servers in a backend like this:

    server-template s 0-499 127.0.0.1:8000 slowstart 20s weight 256

then you turn all of them down then up, in -master the process runs at
roughly 10% CPU for 20 seconds for the time it takes to progressively
add all nodes to the tree, while with the new version, the load grows
over 20 seconds till it reaches 100% and sometimes triggers the watchdog.

IMHO the full dequeue should only be done in the previous conditions,
or if the key changes, as you mentionned in the comment.

This means that you should store in the server along with the hash_key
the one which corresponds to the indexed nodes (the one that was last
used to insert the server in the tree). I'd just imagine a prev_hash_key
that's synchronized after inserting the server. Then when entering
chash_queue_dequeue_srv(), it would be sufficient to compare the two
keys and decide to call chash_dequeue_srv() in case they don't match.
This way the full dequeue would be done only when the key changes and
not when the weight slightly increases with a same key.

The only other comment I could find is this tiny one, in
chash_compute_node_key():

+       case SRV_HASH_KEY_ADDR_PORT:
+               server_get_inetaddr(s, &srv_addr);
+               if (srv_addr.family != AF_INET && srv_addr.family != AF_INET6) {
+                 hash_key = SRV_HASH_KEY_ID;
+               }

As you can see there's an indent issue before hash_key. I said it was
tiny ;-)

That's essentially all I could find, so I'm pretty confident that with
that addressed, the patch can be merged quickly!

Thank you!
Willy

Re: [PATCH] MEDIUM: lb-chash: Deterministic node hashes based on server address

Reply via email to