Re: Consistent hashing alternative to sdbm

2013-10-29 Thread Willy Tarreau
Hi Bhaskar,

On Tue, Oct 29, 2013 at 12:44:58AM -0400, Bhaskar Maddala wrote:
 Hello,
 
   Can you please take a look at [1]? Make sure it is what you had in mind,
 I read thru our conversation here again and I understood that the change we
 wanted to implement allowed selection of the hash function in addition to
 map-based/consistent and avalance.
 
The change provide the ability to specify.  indicates optional
 
   hash-type consistent sdbm/djb2/wt6
   hash-type map-based sdbm/djb2/wt6
   hash-type avalanche sdbm/djb2/wt6
 
   Not all of it is implemented, i am in the middle of testing, but wanted
 any early feed back you might have before i spent a lot of time on it.

I feel a bit bothered by having the if on the hash type done for
every single character. I'd rather have 3 hash functions that work
on (ptr, len) and call the right one with the string and length
instead.

It will also allow us to have clean hash functions resusable for
anything else.

Concerning the config, initially I thought that having a separate
keyword (eg: hash-algo) to set the algorithm was better than mixing
it with the hash-type keyword. But now I'm not completely sure about
this because probably people who want to set the algo will also want
to be sure about the type of hashing they're applying. I'd like to
get other users' feedback on this, particularly those using the
consistent hashing.

Thanks,
Willy




Re: Consistent hashing alternative to sdbm

2013-10-29 Thread Bhaskar Maddala
Hello,

   I updated the diff [1], it uses function now instead of macros, and
added hash function wt6. I did smoke testing stepping thru the code via
the debugger for all hash functions and it looks good, however requires
more rigorous testing which I will do later today.

   On mixing on hashing function I initially tried the alternative of a
separate keyword and settled on using the same keyword when I found the
unused nibble in the bit masks. Fwiw, using the separate keyword makes the
code a little simpler, but from a end user standpoint (which includes me) I
found not having another keyword to be better.

   It would be great if you can take a look at [1] once more and see if you
want anything changed. I did not look hard enough, but can/should I add
some configs to tests/ folder and how/do these get run when invoking make,
or do you run these in some other manner. Are there any additional tests
you would like written?

Thanks
Bhaskar

[1] https://github.com/maddalab/haproxy/pull/1


On Tue, Oct 29, 2013 at 2:13 AM, Willy Tarreau w...@1wt.eu wrote:

 Hi Bhaskar,

 On Tue, Oct 29, 2013 at 12:44:58AM -0400, Bhaskar Maddala wrote:
  Hello,
 
Can you please take a look at [1]? Make sure it is what you had in
 mind,
  I read thru our conversation here again and I understood that the change
 we
  wanted to implement allowed selection of the hash function in addition to
  map-based/consistent and avalance.
 
 The change provide the ability to specify.  indicates optional
 
hash-type consistent sdbm/djb2/wt6
hash-type map-based sdbm/djb2/wt6
hash-type avalanche sdbm/djb2/wt6
 
Not all of it is implemented, i am in the middle of testing, but wanted
  any early feed back you might have before i spent a lot of time on it.

 I feel a bit bothered by having the if on the hash type done for
 every single character. I'd rather have 3 hash functions that work
 on (ptr, len) and call the right one with the string and length
 instead.

 It will also allow us to have clean hash functions resusable for
 anything else.

 Concerning the config, initially I thought that having a separate
 keyword (eg: hash-algo) to set the algorithm was better than mixing
 it with the hash-type keyword. But now I'm not completely sure about
 this because probably people who want to set the algo will also want
 to be sure about the type of hashing they're applying. I'd like to
 get other users' feedback on this, particularly those using the
 consistent hashing.

 Thanks,
 Willy




RE: Consistent hashing alternative to sdbm

2013-10-29 Thread Richard Harris
Unsubscribe

From: Bhaskar Maddala [mailto:madda...@gmail.com]
Sent: Tuesday, October 29, 2013 9:20 AM
To: Willy Tarreau
Cc: haproxy@formilux.org
Subject: Re: Consistent hashing alternative to sdbm

Hello,

   I updated the diff [1], it uses function now instead of macros, and added 
hash function wt6. I did smoke testing stepping thru the code via the 
debugger for all hash functions and it looks good, however requires more 
rigorous testing which I will do later today.

   On mixing on hashing function I initially tried the alternative of a 
separate keyword and settled on using the same keyword when I found the unused 
nibble in the bit masks. Fwiw, using the separate keyword makes the code a 
little simpler, but from a end user standpoint (which includes me) I found not 
having another keyword to be better.

   It would be great if you can take a look at [1] once more and see if you 
want anything changed. I did not look hard enough, but can/should I add some 
configs to tests/ folder and how/do these get run when invoking make, or do you 
run these in some other manner. Are there any additional tests you would like 
written?

Thanks
Bhaskar

[1] https://github.com/maddalab/haproxy/pull/1

On Tue, Oct 29, 2013 at 2:13 AM, Willy Tarreau 
w...@1wt.eumailto:w...@1wt.eu wrote:
Hi Bhaskar,

On Tue, Oct 29, 2013 at 12:44:58AM -0400, Bhaskar Maddala wrote:
 Hello,

   Can you please take a look at [1]? Make sure it is what you had in mind,
 I read thru our conversation here again and I understood that the change we
 wanted to implement allowed selection of the hash function in addition to
 map-based/consistent and avalance.

The change provide the ability to specify.  indicates optional

   hash-type consistent sdbm/djb2/wt6
   hash-type map-based sdbm/djb2/wt6
   hash-type avalanche sdbm/djb2/wt6

   Not all of it is implemented, i am in the middle of testing, but wanted
 any early feed back you might have before i spent a lot of time on it.
I feel a bit bothered by having the if on the hash type done for
every single character. I'd rather have 3 hash functions that work
on (ptr, len) and call the right one with the string and length
instead.

It will also allow us to have clean hash functions resusable for
anything else.

Concerning the config, initially I thought that having a separate
keyword (eg: hash-algo) to set the algorithm was better than mixing
it with the hash-type keyword. But now I'm not completely sure about
this because probably people who want to set the algo will also want
to be sure about the type of hashing they're applying. I'd like to
get other users' feedback on this, particularly those using the
consistent hashing.

Thanks,
Willy



Re: Consistent hashing alternative to sdbm

2013-10-29 Thread Bhaskar Maddala
Hello,

   Please hold off on reviewing the code if you have not yet spent any
time. I have found at least 1 issue. However feel free to respond on the
questions about automated testing.

  I will send an update once I have the issue resolved. Thank you

-Bhaskar


On Tue, Oct 29, 2013 at 9:19 AM, Bhaskar Maddala madda...@gmail.com wrote:

 Hello,

I updated the diff [1], it uses function now instead of macros, and
 added hash function wt6. I did smoke testing stepping thru the code via
 the debugger for all hash functions and it looks good, however requires
 more rigorous testing which I will do later today.

On mixing on hashing function I initially tried the alternative of a
 separate keyword and settled on using the same keyword when I found the
 unused nibble in the bit masks. Fwiw, using the separate keyword makes the
 code a little simpler, but from a end user standpoint (which includes me) I
 found not having another keyword to be better.

It would be great if you can take a look at [1] once more and see if
 you want anything changed. I did not look hard enough, but can/should I add
 some configs to tests/ folder and how/do these get run when invoking make,
 or do you run these in some other manner. Are there any additional tests
 you would like written?

 Thanks
 Bhaskar

 [1] https://github.com/maddalab/haproxy/pull/1


 On Tue, Oct 29, 2013 at 2:13 AM, Willy Tarreau w...@1wt.eu wrote:

 Hi Bhaskar,

 On Tue, Oct 29, 2013 at 12:44:58AM -0400, Bhaskar Maddala wrote:
  Hello,
 
Can you please take a look at [1]? Make sure it is what you had in
 mind,
  I read thru our conversation here again and I understood that the
 change we
  wanted to implement allowed selection of the hash function in addition
 to
  map-based/consistent and avalance.
 
 The change provide the ability to specify.  indicates optional
 
hash-type consistent sdbm/djb2/wt6
hash-type map-based sdbm/djb2/wt6
hash-type avalanche sdbm/djb2/wt6
 
Not all of it is implemented, i am in the middle of testing, but
 wanted
  any early feed back you might have before i spent a lot of time on it.

 I feel a bit bothered by having the if on the hash type done for
 every single character. I'd rather have 3 hash functions that work
 on (ptr, len) and call the right one with the string and length
 instead.

 It will also allow us to have clean hash functions resusable for
 anything else.

 Concerning the config, initially I thought that having a separate
 keyword (eg: hash-algo) to set the algorithm was better than mixing
 it with the hash-type keyword. But now I'm not completely sure about
 this because probably people who want to set the algo will also want
 to be sure about the type of hashing they're applying. I'd like to
 get other users' feedback on this, particularly those using the
 consistent hashing.

 Thanks,
 Willy





Re: Consistent hashing alternative to sdbm

2013-10-28 Thread Bhaskar Maddala
Hello,

  Can you please take a look at [1]? Make sure it is what you had in mind,
I read thru our conversation here again and I understood that the change we
wanted to implement allowed selection of the hash function in addition to
map-based/consistent and avalance.

   The change provide the ability to specify.  indicates optional

  hash-type consistent sdbm/djb2/wt6
  hash-type map-based sdbm/djb2/wt6
  hash-type avalanche sdbm/djb2/wt6

  Not all of it is implemented, i am in the middle of testing, but wanted
any early feed back you might have before i spent a lot of time on it.

Thanks
-Bhaskar


[1] https://github.com/maddalab/haproxy/pull/1


On Fri, Oct 25, 2013 at 2:19 AM, Willy Tarreau w...@1wt.eu wrote:

 Hi Bhaskar,

 First, you did an amazing work, thanks for sharing your findings!

 On Thu, Oct 24, 2013 at 11:12:14PM -0400, Bhaskar Maddala wrote:
  Hello,
 
  This is a little long, please bear with me. I was able to run more
  tests the results are at [1]. As I mentioned before I was not able to
  reproduce the distribution we had seen when we implemented our patch in
  Haproxy using hashcount.c yesterday, the only remaining difference was
  actually having the requests be served via an instance of haproxy.
 
 I broke down the tests into the following 3 categories
 (a) ~10K request
 (b) All unique host names obtained from 1MM requests that were grepped
  from haproxy production logs, there were 48259 blogs
 (c) 1MM requests
 
 Case (a) used the data set we had based our decision on, case (b) and
  (c) used a data set I obtained at the beginning of this conversation.
 
  I then ran these 3 scenarios via both haproxy and hashcount.c for
 DJB2
  and SDBM. I was able to reproduce* the distribution we had based our
  decision on when using haproxy. Interestingly the test harness
  (hashcount.c) produces wildly different distribution when the results are
  compared to an instance of Haproxy, more so for SDBM.

 I'm thinking about something. If you're running with consistent hashing,
 another element matters a lot : the average distance between all output
 values. Indeed, with consistent hashing, we're not dividing the hash by
 the number of servers, we're searching the point that's the closest to
 the hash and assigning the connection the server associated with this
 point. So if the hash produces clusters of points, there are more risks
 that some will get more traffic than others due to their proximity with
 server points. I think that the image below summarizes what I mean very
 well :

 http://offthelip.org/?p=149

 Right now, each server appears 16*weight times on the consistent hash
 circle and these places are pseudo random and uniform (I spent a lot
 of time ensuring this in the past, so I'm pretty sure about this but
 I'd be happy if we can improve this). You can check in lb_chash.c how
 this is done in chash_init_server_tree().

 I don't know how to reliably modelize that in hashcount.c. I can't
 find the test code I used when developing the consistent hash, so if
 we want to run new experimentations, I guess we'll have to copy-paste
 existing code into a new file.

 I'm thinking about something : since we want the points to all be the
 farthest from each other, we could measure the hash efficiency for
 consistent hashing this way :

   - print all hash points
   - sort them by value
   - replace each of them with the distance from the previous one
   - compute the stddev on these values

 If you're interested in running a test using this, that would be great.
 You can restart from the hashcount you already have since you have already
 implemented the stddev in it.

 As a sanity check, I verified that the results were reproducible under
  repeated executions.
 
 My test haproxy configuration is available at [2]. With the goal to
  eliminate as many variables as possible, I set up 17 back ends to all
 use a
  single nginx server that served static files from the file system, nginx
  config is also available at [2]
 
 A request flow would look like this
 curl with custom header - Haproxy on 80 - hash on custom header  to
  select backend - (all backends point to the same nginx instance) -
  request forwarded to backend nginx - nginx responds
 
Following/Similar [3] was executed from the machine housing haproxy 
  nginx.
 
Finally a note on the results [1]. Each sheet corresponds to each of
 the
  cases (a)/(b)/(c). On the first sheet you will see the std dev numbers
  under the heading executing via haproxy that we based our decision on.
  The raw results are also available at [2]
 
Let me know if you would like to see more numbers with a changed
  methodology etc. or if you have an explanation for the difference is
  distribution when using haproxy vs using hashcount.c after taking a look
 at
  the config.

 Yes, please try to run a test modeling consistent hashing as explained
 above. If something is not clear, do not 

Re: Consistent hashing alternative to sdbm

2013-10-20 Thread Willy Tarreau
Hello,

On Fri, Oct 18, 2013 at 06:54:40PM -0400, Bhaskar Maddala wrote:
 Hello,
 
   We have been using haproxy for load balancing requests across our
 backbends. We use consistent hashing, in experiments for the requests
 processed and specific to our architecture we found dbj2 providing a better
 distribution over sdbm when the remainder of the configuration was left
 unchanged.
 
   We have a patched version of haproxy based on 1.4.24 that adds a config
 option of hash-func with values sdbm and dbj2, sdbm as default, we have
 been using dbj2 in production for a while now. I am writing to determine
 interest if any for this change before submitting a patch.

I don't know these sdbm nor dbj2 functions. For the later I suspect you
meant djb2 instead which is the same as hash_djbx33 in tests_hashes.c.
But sdbm I still don't know.

In 1.5, an avalanche hash is always performed before applying the
consistent hash, which should provide a smoother distribution than
in 1.4. Could you please check if 1.5 still needs this ability to
change the hash function ? I'm not opposed to a new parameter, but
I'd prefer to avoid it if it's not necessary.

Thanks,
Willy