Brad Fitzpatrick wrote:
On Wed, 8 Aug 2007, Alex Stapleton wrote:
Brad Fitzpatrick wrote:
-- what is the input to the digest algorithm for points 1-n?
e.g.:
"127.0.0.1:11211-1"
"127.0.0.1:11211-2"
"127.0.0.1:11211-3"
....
"127.0.0.1:11211-<n>"
It seems like running multiple hashes over the string seems like a bit
of a waste of cycles. Modifying the hash value for each point would be
much more efficient and simpler to implement. Something like point value
= <hash value> + (<point number> * 2654435761)
Alex,
I don't think that works well. Consider:
item A with weight 1.
item B with weight 1.
When hashing item A onto the 32-bit continuum you end up with, say, 10.
When hashing item B onto the 32-bit continuum you end up with, say, 11.
Now, because of this one unlucky distribution where A only has 1 unit of
space between it and B, you're not saved by the multiple points.
Now item A's second point is at 2654435761+10, and item B is at
2654435761+11, etc...
Yes thats a good point. I have a method that doesn't have that issue
however I can't really think of an implementation that doesn't rely on
integer overflow semantics which could be a problem if trying to do
implementations in some languages, e.g. Perl, Python, PHP.
If we want to make life easy for people implementing clients in
languages which don't have a simple way to use C style integers then
perhaps doing what you initially suggested is the best approach after
all my whining? :)