Hi Jeff,

I'm copying Dirk who has experienced exactly the same issue as yours
a few weeks ago, and we stopped investigating because of a supposedly
insufficient input data distribution. However, David (cc'd) has been
using consistent hashing with success. Also, I know that it was working
for me when I committed it.

Your finding below is extremely important. It may explain why it works
for some people and not for others. I know that some people force all
the server IDs in their configs. Probably that David does.

All the points you raised are perfectly valid, and your code analysis
is good. And guess what ? The code which touches the srv->puid entries
was committed 3 days after the consistent hash !

Indeed, I've done a test right now with URL hash on 3 servers S1/2/3,
and here's the distribution I'm observing for 1000 random requests :

    485 S1
    515 S3

If I manually assign IDs 1, 2, 3 to S1, S2, S3 :

    345 S1
    304 S2
    351 S3

With the attached patch which assigns the default IDs before the
table is initialized, it gives the same :

    323 S1
    308 S2
    369 S3

So that means that users who need to fix their setups right now,
it's enough to manually assign distinct IDs to each server. For
the ones that can wait a bit later, the attached patch fixes the
issue and will be released with 1.4.7.

Congratulations Jeff for your in-depth analysis, and thank you
very much for the time you might have spent reading the code !

Willy

On Tue, May 25, 2010 at 03:03:10PM -0400, Jeffrey J. Persch wrote:
> Greetings,
> 
> I have reproduced this same issue on haproxy-1.4.6.
> 
> Configure a backend with
>       balance url_param x
>       hash-type consistent
>       server s1 host1:80
>       server s2 host2:80
>       server s3 host3:80
>       server s4 host4:80
> 
> The server will only dispatch requests to two servers.
> 
> The problem is seen in chash_init_server_tree:
> 
>       /* queue active and backup servers in two distinct groups */
>       for (srv = p->srv; srv; srv = srv->next) {
>       
>               srv->lb_tree = (srv->state & SRV_BACKUP) ? &p->lbprm.chash.bck :
> &p->lbprm.chash.act;
>               srv->lb_nodes_tot = srv->uweight * BE_WEIGHT_SCALE;
>               srv->lb_nodes_now = 0;
>               srv->lb_nodes = (struct tree_occ *)calloc(srv->lb_nodes_tot,
> sizeof(struct tree_occ));
> 
>               for (node = 0; node < srv->lb_nodes_tot; node++) {
>                       srv->lb_nodes[node].server = srv;
>                       srv->lb_nodes[node].node.key = chash_hash(srv->puid *
> SRV_EWGHT_RANGE + node);
>               }
> 
>               if (srv_is_usable(srv->state, srv->eweight))
>                       chash_queue_dequeue_srv(srv);
>       }
> 
> The problem is that when this code runs, all of the servers have the
> same srv->puid == 0.
> With equal weights, each server generates the exact same set of node
> keys.
> 
> Regardless of the key you lookup using eb32_lookup_ge, you get a node
> for the last server.
> The chash_get_server_hash function is reduced to choosing between the
> closer of the last or first server.
> 
> Any insights as to why srv->puid is zero and not unique per server?
> 
> Jeff Persch
> 
> 
>From 87d28228de2a5c9806c1884a9c6b29142fd3aa7b Mon Sep 17 00:00:00 2001
From: Willy Tarreau <w...@1wt.eu>
Date: Tue, 25 May 2010 23:03:02 +0200
Subject: [BUG] consistent hash: balance on all servers, not only 2 !

It was once reported at least by Dirk Taggesell that the consistent
hash had a very poor distribution, making use of only two servers.
Jeff Persch analysed the code and found the root cause. Consistent
hash makes use of the server IDs, which are completed after the chash
array initialization. This implies that each server which does not
have an explicit "id" parameter will be merged at the same place in
the chash tree and that in the end, only the first or last servers
may be used.

The now obvious fix (thanks to Jeff) is to assign the missing IDs
earlier. However, it should be clearly understood that changing a
hash algorithm on live systems will rebalance the whole system.
Anyway, the only affected users will be the ones for which the
system is quite unbalanced already. The ones who fix their IDs are
not affected at all.

Kudos to Jeff for spotting that bug which got merged 3 days after
the consistent hashing !
---
 src/cfgparse.c |   27 ++++++++++++++++-----------
 1 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/src/cfgparse.c b/src/cfgparse.c
index 0e36050..cc82544 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -5045,6 +5045,22 @@ out_uri_auth_compat:
                        curproxy->srv = next;
                }
 
+               /* assign automatic UIDs to servers which don't have one yet */
+               next_id = 1;
+               newsrv = curproxy->srv;
+               while (newsrv != NULL) {
+                       if (!newsrv->puid) {
+                               /* server ID not set, use automatic numbering 
with first
+                                * spare entry starting with next_svid.
+                                */
+                               next_id = 
get_next_id(&curproxy->conf.used_server_id, next_id);
+                               newsrv->conf.id.key = newsrv->puid = next_id;
+                               eb32_insert(&curproxy->conf.used_server_id, 
&newsrv->conf.id);
+                       }
+                       next_id++;
+                       newsrv = newsrv->next;
+               }
+
                curproxy->lbprm.wmult = 1; /* default weight multiplier */
                curproxy->lbprm.wdiv  = 1; /* default weight divider */
 
@@ -5155,19 +5171,8 @@ out_uri_auth_compat:
                /*
                 * ensure that we're not cross-dressing a TCP server into HTTP.
                 */
-               next_id = 1;
                newsrv = curproxy->srv;
                while (newsrv != NULL) {
-                       if (!newsrv->puid) {
-                               /* server ID not set, use automatic numbering 
with first
-                                * spare entry starting with next_svid.
-                                */
-                               next_id = 
get_next_id(&curproxy->conf.used_server_id, next_id);
-                               newsrv->conf.id.key = newsrv->puid = next_id;
-                               eb32_insert(&curproxy->conf.used_server_id, 
&newsrv->conf.id);
-                       }
-                       next_id++;
-
                        if ((curproxy->mode != PR_MODE_HTTP) && 
(newsrv->rdr_len || newsrv->cklen)) {
                                Alert("config : %s '%s' : server cannot have 
cookie or redirect prefix in non-HTTP mode.\n",
                                      proxy_type_str(curproxy), curproxy->id);
-- 
1.6.4.4

Reply via email to