Hi Jeff, I'm copying Dirk who has experienced exactly the same issue as yours a few weeks ago, and we stopped investigating because of a supposedly insufficient input data distribution. However, David (cc'd) has been using consistent hashing with success. Also, I know that it was working for me when I committed it.
Your finding below is extremely important. It may explain why it works for some people and not for others. I know that some people force all the server IDs in their configs. Probably that David does. All the points you raised are perfectly valid, and your code analysis is good. And guess what ? The code which touches the srv->puid entries was committed 3 days after the consistent hash ! Indeed, I've done a test right now with URL hash on 3 servers S1/2/3, and here's the distribution I'm observing for 1000 random requests : 485 S1 515 S3 If I manually assign IDs 1, 2, 3 to S1, S2, S3 : 345 S1 304 S2 351 S3 With the attached patch which assigns the default IDs before the table is initialized, it gives the same : 323 S1 308 S2 369 S3 So that means that users who need to fix their setups right now, it's enough to manually assign distinct IDs to each server. For the ones that can wait a bit later, the attached patch fixes the issue and will be released with 1.4.7. Congratulations Jeff for your in-depth analysis, and thank you very much for the time you might have spent reading the code ! Willy On Tue, May 25, 2010 at 03:03:10PM -0400, Jeffrey J. Persch wrote: > Greetings, > > I have reproduced this same issue on haproxy-1.4.6. > > Configure a backend with > balance url_param x > hash-type consistent > server s1 host1:80 > server s2 host2:80 > server s3 host3:80 > server s4 host4:80 > > The server will only dispatch requests to two servers. > > The problem is seen in chash_init_server_tree: > > /* queue active and backup servers in two distinct groups */ > for (srv = p->srv; srv; srv = srv->next) { > > srv->lb_tree = (srv->state & SRV_BACKUP) ? &p->lbprm.chash.bck : > &p->lbprm.chash.act; > srv->lb_nodes_tot = srv->uweight * BE_WEIGHT_SCALE; > srv->lb_nodes_now = 0; > srv->lb_nodes = (struct tree_occ *)calloc(srv->lb_nodes_tot, > sizeof(struct tree_occ)); > > for (node = 0; node < srv->lb_nodes_tot; node++) { > srv->lb_nodes[node].server = srv; > srv->lb_nodes[node].node.key = chash_hash(srv->puid * > SRV_EWGHT_RANGE + node); > } > > if (srv_is_usable(srv->state, srv->eweight)) > chash_queue_dequeue_srv(srv); > } > > The problem is that when this code runs, all of the servers have the > same srv->puid == 0. > With equal weights, each server generates the exact same set of node > keys. > > Regardless of the key you lookup using eb32_lookup_ge, you get a node > for the last server. > The chash_get_server_hash function is reduced to choosing between the > closer of the last or first server. > > Any insights as to why srv->puid is zero and not unique per server? > > Jeff Persch > >
>From 87d28228de2a5c9806c1884a9c6b29142fd3aa7b Mon Sep 17 00:00:00 2001 From: Willy Tarreau <w...@1wt.eu> Date: Tue, 25 May 2010 23:03:02 +0200 Subject: [BUG] consistent hash: balance on all servers, not only 2 ! It was once reported at least by Dirk Taggesell that the consistent hash had a very poor distribution, making use of only two servers. Jeff Persch analysed the code and found the root cause. Consistent hash makes use of the server IDs, which are completed after the chash array initialization. This implies that each server which does not have an explicit "id" parameter will be merged at the same place in the chash tree and that in the end, only the first or last servers may be used. The now obvious fix (thanks to Jeff) is to assign the missing IDs earlier. However, it should be clearly understood that changing a hash algorithm on live systems will rebalance the whole system. Anyway, the only affected users will be the ones for which the system is quite unbalanced already. The ones who fix their IDs are not affected at all. Kudos to Jeff for spotting that bug which got merged 3 days after the consistent hashing ! --- src/cfgparse.c | 27 ++++++++++++++++----------- 1 files changed, 16 insertions(+), 11 deletions(-) diff --git a/src/cfgparse.c b/src/cfgparse.c index 0e36050..cc82544 100644 --- a/src/cfgparse.c +++ b/src/cfgparse.c @@ -5045,6 +5045,22 @@ out_uri_auth_compat: curproxy->srv = next; } + /* assign automatic UIDs to servers which don't have one yet */ + next_id = 1; + newsrv = curproxy->srv; + while (newsrv != NULL) { + if (!newsrv->puid) { + /* server ID not set, use automatic numbering with first + * spare entry starting with next_svid. + */ + next_id = get_next_id(&curproxy->conf.used_server_id, next_id); + newsrv->conf.id.key = newsrv->puid = next_id; + eb32_insert(&curproxy->conf.used_server_id, &newsrv->conf.id); + } + next_id++; + newsrv = newsrv->next; + } + curproxy->lbprm.wmult = 1; /* default weight multiplier */ curproxy->lbprm.wdiv = 1; /* default weight divider */ @@ -5155,19 +5171,8 @@ out_uri_auth_compat: /* * ensure that we're not cross-dressing a TCP server into HTTP. */ - next_id = 1; newsrv = curproxy->srv; while (newsrv != NULL) { - if (!newsrv->puid) { - /* server ID not set, use automatic numbering with first - * spare entry starting with next_svid. - */ - next_id = get_next_id(&curproxy->conf.used_server_id, next_id); - newsrv->conf.id.key = newsrv->puid = next_id; - eb32_insert(&curproxy->conf.used_server_id, &newsrv->conf.id); - } - next_id++; - if ((curproxy->mode != PR_MODE_HTTP) && (newsrv->rdr_len || newsrv->cklen)) { Alert("config : %s '%s' : server cannot have cookie or redirect prefix in non-HTTP mode.\n", proxy_type_str(curproxy), curproxy->id); -- 1.6.4.4