Thank you for your comments, Willy. > 1) "hash algorithm" => I realized that this naming is confusing > because it's used in conjunction with the balance algorithm. > In practice, both the terms "hash algorithm" or "hash function" > are used, with the latter being much more common. So I changed > again the HALG_* flags to HFCN_*.
+1. My preference was hash function. > 2) These HFCN_* flags are not contained anymore in the BE_LB_ALGO > mask, and this significantly simplifies the patches (no more mess > trying to exclude some masks) Works for me, I did consider a few alternatives before I went with what I did. One of the alternatives was to use the bits assigned to hash types. Tho, I did not try the combination you did. Otoh, I found the handling of balance parameter, which starts by clearing all the flags set in hash-type and rebuilding the fields to be more confusing > So now if "hash-type avalanche" is found, the error message > explains how to replace it. +1 > Worse, when applying consistent hashing on that, only two servers > got the load for several seconds, then two other ones. This is interesting. I did not do a time series distribution in my testing and this is good to know. I will probably do this in the next couple of day > So in the end, I changed my mind regarding > the wt6 function and accepted to reintroduce it because in my tests > it performed better for such workloads in my tests since I can't > produce these nasty patterns with it. > pick the best hashing function for the job depending on what > is being hashed I agree, the choice of the function depends on the input more than once would like it to. Hence, my results were usually prefixed with, "on my dataset". > I'm attaching the 3 resulting patches here. Please tell me if that's > OK for you, in which case I'm going to merge them. Please do. The big difference I see between what I did and here is the bit masking around function and algorithm. As I mentioned, I tried a few alternatives and settled on one that worked for me. Another criteria that I used was to have the least number of changes to add an additional hashing function. I will also apply the patches and do a few sanity tests on my end later today. Sorry about the delay in mailing the test results csv. I did try that yesterday but the download as csv generated some fairly poor results and I need to clean those up as well. I will do that today. On Thu, Nov 14, 2013 at 9:24 AM, Willy Tarreau <w...@1wt.eu> wrote: > Hi Bhaskar, > > OK I'm finally done with this. Having reviewed the existing code > allowed me to change my mind on a few points. > > 1) "hash algorithm" => I realized that this naming is confusing > because it's used in conjunction with the balance algorithm. > In practice, both the terms "hash algorithm" or "hash function" > are used, with the latter being much more common. So I changed > again the HALG_* flags to HFCN_*. > > 2) These HFCN_* flags are not contained anymore in the BE_LB_ALGO > mask, and this significantly simplifies the patches (no more mess > trying to exclude some masks) > > 3) In order to make the code more homogenous, there a mask to detect > the modifier as well, which right now may only be "avalanche". > > 4) I finally thought that despite "hash-type avalanche" not being used > anywhere, it was a bit rude to remove it without any information. > This can happen if someone uses an obsolete doc like those found on > code.google.com/p/haproxy-docs. So now if "hash-type avalanche" is > found, the error message explains how to replace it. > > 5) while running the tests at the end, I found that DJB2 was awful with > numbers on the input (my test tool simply put a visitor number in a > URL parameter which I used to balance on). I had 64 servers in the > farm, and some of them took more than twice the load of the others. > Worse, when applying consistent hashing on that, only two servers > got the load for several seconds, then two other ones, then two > other ones, etc... So in practice the load was spread over all the > servers over the long term, but the instant load was running only > on two among 64. Running with 33 servers with map-based resulted > in only 10 servers taking traffic, as expected. Using sdbm with 64 > servers resulted in only half the servers getting traffic. Adding > avalanche fixed the issue in both cases, but at the expense of a > less smooth distribution. So in the end, I changed my mind regarding > the wt6 function and accepted to reintroduce it because in my tests > it performed better for such workloads in my tests since I can't > produce these nasty patterns with it. > > Now I have tested all combinations. Everything looks OK and I really > like this new ability to pick the best hashing function for the job > depending on what is being hashed (eg: neither sdbm nor djb2 are good > on IP addresses unless avalanche is used). > > I'm attaching the 3 resulting patches here. Please tell me if that's > OK for you, in which case I'm going to merge them. > > Thanks! > Willy > >