Re: ketama consistent hashing update

a. Thu, 09 Aug 2007 03:31:09 -0700

Hi, check out this page http://bretm.home.comcast.net/hash/8.html forCRC-32

It seems that FNV hash or the modified version at http://bretm.home.comcast.net/hash/6.html is much better for therequirements. (This is what I use in my client too.)




.



On Aug 9, 2007, at 12:20 PM, Richard Jones wrote:

Multiple hash functions are required or you lose the benefit ofmultiple
points per server in the first place, for the reasons brad and russ
mentioned.
I'm going to give crc32 a go (for lookups and initial hashing) andsee how thedistribution looks, it's probably cheaper, and makes sense as it ismorewidely available. There was no particular reason why I used md5over any
other hash.
160 was chosen as the number of points per server because it showeda decentdistribution.. it's "big enough". Once the hashing and collisionhandling arewell defined, this could potentially be tuned per-install, andstill maintain
compatability between clients. However, i think we can agree on a sane
default with a bit more testing.
I recently spotted a consistent hashing implementation in a branchof php/peclmemcache cvs - http://pecl.php.net/package/memcache which iirc usescrc32
instead of md5. Need to investigate this some more.

RJ


On Wednesday 08 August 2007 13:03:37 Alex Stapleton wrote:
Brad Fitzpatrick wrote:
   -- which digest algorithm(s) for the points and/or array lookup?
There are a seemingly infinite array of hash functions out therebut wealready have code for CRC32 on a vast number of platforms, isthere anyparticular reason – be it speed or quality – that we shouldn'tcontinue
to use that?
   -- how many points are put on the continuum for each target?
      +
   -- what is the input to the digest algorithm for points 1-n?
         e.g.:
         "127.0.0.1:11211-1"
         "127.0.0.1:11211-2"
         "127.0.0.1:11211-3"
                 ....
         "127.0.0.1:11211-<n>"
It seems like running multiple hashes over the string seems like abitof a waste of cycles. Modifying the hash value for each pointwould bemuch more efficient and simpler to implement. Something like pointvalue
= <hash value> + (<point number> * 2654435761)
As Steve says in his follow-up we also need to handle collisionswhilecreating the continuum. Simply incrementing the point value by 1untilwe find an empty place is going to give us a rather biasedcontinuum. Ithink that incrementing the point number 1 and recalculating thepoint
value from that should give a better distribution while being fairly
easy to actually implement.
--
Richard Jones
Last.fm Ltd. | http://www.last.fm/
Office: +44 (0) 207 780 7080

Re: ketama consistent hashing update

Reply via email to