Re: Consistent hashing alternative to sdbm
Hi Bhaskar, On Tue, Oct 29, 2013 at 12:44:58AM -0400, Bhaskar Maddala wrote: Hello, Can you please take a look at [1]? Make sure it is what you had in mind, I read thru our conversation here again and I understood that the change we wanted to implement allowed selection of the hash function in addition to map-based/consistent and avalance. The change provide the ability to specify. indicates optional hash-type consistent sdbm/djb2/wt6 hash-type map-based sdbm/djb2/wt6 hash-type avalanche sdbm/djb2/wt6 Not all of it is implemented, i am in the middle of testing, but wanted any early feed back you might have before i spent a lot of time on it. I feel a bit bothered by having the if on the hash type done for every single character. I'd rather have 3 hash functions that work on (ptr, len) and call the right one with the string and length instead. It will also allow us to have clean hash functions resusable for anything else. Concerning the config, initially I thought that having a separate keyword (eg: hash-algo) to set the algorithm was better than mixing it with the hash-type keyword. But now I'm not completely sure about this because probably people who want to set the algo will also want to be sure about the type of hashing they're applying. I'd like to get other users' feedback on this, particularly those using the consistent hashing. Thanks, Willy
Re: Consistent hashing alternative to sdbm
Hello, I updated the diff [1], it uses function now instead of macros, and added hash function wt6. I did smoke testing stepping thru the code via the debugger for all hash functions and it looks good, however requires more rigorous testing which I will do later today. On mixing on hashing function I initially tried the alternative of a separate keyword and settled on using the same keyword when I found the unused nibble in the bit masks. Fwiw, using the separate keyword makes the code a little simpler, but from a end user standpoint (which includes me) I found not having another keyword to be better. It would be great if you can take a look at [1] once more and see if you want anything changed. I did not look hard enough, but can/should I add some configs to tests/ folder and how/do these get run when invoking make, or do you run these in some other manner. Are there any additional tests you would like written? Thanks Bhaskar [1] https://github.com/maddalab/haproxy/pull/1 On Tue, Oct 29, 2013 at 2:13 AM, Willy Tarreau w...@1wt.eu wrote: Hi Bhaskar, On Tue, Oct 29, 2013 at 12:44:58AM -0400, Bhaskar Maddala wrote: Hello, Can you please take a look at [1]? Make sure it is what you had in mind, I read thru our conversation here again and I understood that the change we wanted to implement allowed selection of the hash function in addition to map-based/consistent and avalance. The change provide the ability to specify. indicates optional hash-type consistent sdbm/djb2/wt6 hash-type map-based sdbm/djb2/wt6 hash-type avalanche sdbm/djb2/wt6 Not all of it is implemented, i am in the middle of testing, but wanted any early feed back you might have before i spent a lot of time on it. I feel a bit bothered by having the if on the hash type done for every single character. I'd rather have 3 hash functions that work on (ptr, len) and call the right one with the string and length instead. It will also allow us to have clean hash functions resusable for anything else. Concerning the config, initially I thought that having a separate keyword (eg: hash-algo) to set the algorithm was better than mixing it with the hash-type keyword. But now I'm not completely sure about this because probably people who want to set the algo will also want to be sure about the type of hashing they're applying. I'd like to get other users' feedback on this, particularly those using the consistent hashing. Thanks, Willy
RE: Consistent hashing alternative to sdbm
Unsubscribe From: Bhaskar Maddala [mailto:madda...@gmail.com] Sent: Tuesday, October 29, 2013 9:20 AM To: Willy Tarreau Cc: haproxy@formilux.org Subject: Re: Consistent hashing alternative to sdbm Hello, I updated the diff [1], it uses function now instead of macros, and added hash function wt6. I did smoke testing stepping thru the code via the debugger for all hash functions and it looks good, however requires more rigorous testing which I will do later today. On mixing on hashing function I initially tried the alternative of a separate keyword and settled on using the same keyword when I found the unused nibble in the bit masks. Fwiw, using the separate keyword makes the code a little simpler, but from a end user standpoint (which includes me) I found not having another keyword to be better. It would be great if you can take a look at [1] once more and see if you want anything changed. I did not look hard enough, but can/should I add some configs to tests/ folder and how/do these get run when invoking make, or do you run these in some other manner. Are there any additional tests you would like written? Thanks Bhaskar [1] https://github.com/maddalab/haproxy/pull/1 On Tue, Oct 29, 2013 at 2:13 AM, Willy Tarreau w...@1wt.eumailto:w...@1wt.eu wrote: Hi Bhaskar, On Tue, Oct 29, 2013 at 12:44:58AM -0400, Bhaskar Maddala wrote: Hello, Can you please take a look at [1]? Make sure it is what you had in mind, I read thru our conversation here again and I understood that the change we wanted to implement allowed selection of the hash function in addition to map-based/consistent and avalance. The change provide the ability to specify. indicates optional hash-type consistent sdbm/djb2/wt6 hash-type map-based sdbm/djb2/wt6 hash-type avalanche sdbm/djb2/wt6 Not all of it is implemented, i am in the middle of testing, but wanted any early feed back you might have before i spent a lot of time on it. I feel a bit bothered by having the if on the hash type done for every single character. I'd rather have 3 hash functions that work on (ptr, len) and call the right one with the string and length instead. It will also allow us to have clean hash functions resusable for anything else. Concerning the config, initially I thought that having a separate keyword (eg: hash-algo) to set the algorithm was better than mixing it with the hash-type keyword. But now I'm not completely sure about this because probably people who want to set the algo will also want to be sure about the type of hashing they're applying. I'd like to get other users' feedback on this, particularly those using the consistent hashing. Thanks, Willy
Re: Consistent hashing alternative to sdbm
Hello, Please hold off on reviewing the code if you have not yet spent any time. I have found at least 1 issue. However feel free to respond on the questions about automated testing. I will send an update once I have the issue resolved. Thank you -Bhaskar On Tue, Oct 29, 2013 at 9:19 AM, Bhaskar Maddala madda...@gmail.com wrote: Hello, I updated the diff [1], it uses function now instead of macros, and added hash function wt6. I did smoke testing stepping thru the code via the debugger for all hash functions and it looks good, however requires more rigorous testing which I will do later today. On mixing on hashing function I initially tried the alternative of a separate keyword and settled on using the same keyword when I found the unused nibble in the bit masks. Fwiw, using the separate keyword makes the code a little simpler, but from a end user standpoint (which includes me) I found not having another keyword to be better. It would be great if you can take a look at [1] once more and see if you want anything changed. I did not look hard enough, but can/should I add some configs to tests/ folder and how/do these get run when invoking make, or do you run these in some other manner. Are there any additional tests you would like written? Thanks Bhaskar [1] https://github.com/maddalab/haproxy/pull/1 On Tue, Oct 29, 2013 at 2:13 AM, Willy Tarreau w...@1wt.eu wrote: Hi Bhaskar, On Tue, Oct 29, 2013 at 12:44:58AM -0400, Bhaskar Maddala wrote: Hello, Can you please take a look at [1]? Make sure it is what you had in mind, I read thru our conversation here again and I understood that the change we wanted to implement allowed selection of the hash function in addition to map-based/consistent and avalance. The change provide the ability to specify. indicates optional hash-type consistent sdbm/djb2/wt6 hash-type map-based sdbm/djb2/wt6 hash-type avalanche sdbm/djb2/wt6 Not all of it is implemented, i am in the middle of testing, but wanted any early feed back you might have before i spent a lot of time on it. I feel a bit bothered by having the if on the hash type done for every single character. I'd rather have 3 hash functions that work on (ptr, len) and call the right one with the string and length instead. It will also allow us to have clean hash functions resusable for anything else. Concerning the config, initially I thought that having a separate keyword (eg: hash-algo) to set the algorithm was better than mixing it with the hash-type keyword. But now I'm not completely sure about this because probably people who want to set the algo will also want to be sure about the type of hashing they're applying. I'd like to get other users' feedback on this, particularly those using the consistent hashing. Thanks, Willy
Re: Consistent hashing alternative to sdbm
Hello, Can you please take a look at [1]? Make sure it is what you had in mind, I read thru our conversation here again and I understood that the change we wanted to implement allowed selection of the hash function in addition to map-based/consistent and avalance. The change provide the ability to specify. indicates optional hash-type consistent sdbm/djb2/wt6 hash-type map-based sdbm/djb2/wt6 hash-type avalanche sdbm/djb2/wt6 Not all of it is implemented, i am in the middle of testing, but wanted any early feed back you might have before i spent a lot of time on it. Thanks -Bhaskar [1] https://github.com/maddalab/haproxy/pull/1 On Fri, Oct 25, 2013 at 2:19 AM, Willy Tarreau w...@1wt.eu wrote: Hi Bhaskar, First, you did an amazing work, thanks for sharing your findings! On Thu, Oct 24, 2013 at 11:12:14PM -0400, Bhaskar Maddala wrote: Hello, This is a little long, please bear with me. I was able to run more tests the results are at [1]. As I mentioned before I was not able to reproduce the distribution we had seen when we implemented our patch in Haproxy using hashcount.c yesterday, the only remaining difference was actually having the requests be served via an instance of haproxy. I broke down the tests into the following 3 categories (a) ~10K request (b) All unique host names obtained from 1MM requests that were grepped from haproxy production logs, there were 48259 blogs (c) 1MM requests Case (a) used the data set we had based our decision on, case (b) and (c) used a data set I obtained at the beginning of this conversation. I then ran these 3 scenarios via both haproxy and hashcount.c for DJB2 and SDBM. I was able to reproduce* the distribution we had based our decision on when using haproxy. Interestingly the test harness (hashcount.c) produces wildly different distribution when the results are compared to an instance of Haproxy, more so for SDBM. I'm thinking about something. If you're running with consistent hashing, another element matters a lot : the average distance between all output values. Indeed, with consistent hashing, we're not dividing the hash by the number of servers, we're searching the point that's the closest to the hash and assigning the connection the server associated with this point. So if the hash produces clusters of points, there are more risks that some will get more traffic than others due to their proximity with server points. I think that the image below summarizes what I mean very well : http://offthelip.org/?p=149 Right now, each server appears 16*weight times on the consistent hash circle and these places are pseudo random and uniform (I spent a lot of time ensuring this in the past, so I'm pretty sure about this but I'd be happy if we can improve this). You can check in lb_chash.c how this is done in chash_init_server_tree(). I don't know how to reliably modelize that in hashcount.c. I can't find the test code I used when developing the consistent hash, so if we want to run new experimentations, I guess we'll have to copy-paste existing code into a new file. I'm thinking about something : since we want the points to all be the farthest from each other, we could measure the hash efficiency for consistent hashing this way : - print all hash points - sort them by value - replace each of them with the distance from the previous one - compute the stddev on these values If you're interested in running a test using this, that would be great. You can restart from the hashcount you already have since you have already implemented the stddev in it. As a sanity check, I verified that the results were reproducible under repeated executions. My test haproxy configuration is available at [2]. With the goal to eliminate as many variables as possible, I set up 17 back ends to all use a single nginx server that served static files from the file system, nginx config is also available at [2] A request flow would look like this curl with custom header - Haproxy on 80 - hash on custom header to select backend - (all backends point to the same nginx instance) - request forwarded to backend nginx - nginx responds Following/Similar [3] was executed from the machine housing haproxy nginx. Finally a note on the results [1]. Each sheet corresponds to each of the cases (a)/(b)/(c). On the first sheet you will see the std dev numbers under the heading executing via haproxy that we based our decision on. The raw results are also available at [2] Let me know if you would like to see more numbers with a changed methodology etc. or if you have an explanation for the difference is distribution when using haproxy vs using hashcount.c after taking a look at the config. Yes, please try to run a test modeling consistent hashing as explained above. If something is not clear, do not
Re: Consistent hashing alternative to sdbm
Hello, On Fri, Oct 18, 2013 at 06:54:40PM -0400, Bhaskar Maddala wrote: Hello, We have been using haproxy for load balancing requests across our backbends. We use consistent hashing, in experiments for the requests processed and specific to our architecture we found dbj2 providing a better distribution over sdbm when the remainder of the configuration was left unchanged. We have a patched version of haproxy based on 1.4.24 that adds a config option of hash-func with values sdbm and dbj2, sdbm as default, we have been using dbj2 in production for a while now. I am writing to determine interest if any for this change before submitting a patch. I don't know these sdbm nor dbj2 functions. For the later I suspect you meant djb2 instead which is the same as hash_djbx33 in tests_hashes.c. But sdbm I still don't know. In 1.5, an avalanche hash is always performed before applying the consistent hash, which should provide a smoother distribution than in 1.4. Could you please check if 1.5 still needs this ability to change the hash function ? I'm not opposed to a new parameter, but I'd prefer to avoid it if it's not necessary. Thanks, Willy