On Tue, Mar 11, 2008 at 07:13:54 -0700, Brian Aker wrote: > To me ketama is just another distribution type, one that can be > selected by the end user or not. > > I think it is good to have a few "do this, it works across all > libraries", but at the same time I do not want to go to a least common > denominator solution.
I don't get your point. Key distribution as implemented in the original Cache::Memcached seem to be the one common solution. But there also seem to be the demand to have _consistent_ hashing common to all clients. And my point is that if we are choosing Ketama algorithm for that, we should improve the original implementation first, not blindly duplicate it (the one using MD5). This won't limit you in providing further flexibility (though I think that every feature should be justified, and not merely thrown in for the sake of a "richer" choice). Problems with Ketama implementation in libketama as of 0.1.1 are (I know there's a newer release, so maybe some are solved already?): - only addresses shorter than 23 bytes are accepted (including ':' and port number), longer addresses are silently ignored. - parts of MD5 are used as hash values. - sequence numbers of points are stringified (sprintf()ed) before hasing. - qsort() is used, which means unstable order of collided (equal) points. Adding three servers but then removing one, and restarting with left two servers may reorder equal points. The suggestion is to use more appropriate hash function (FNV-1a is the current candidate), support addresses of any length (of course), not stringify sequence numbers (hash binary 4-byte little-endian word directly), and explicitly define the order of collided points (both on server insert, and on server get; FIFO is suggested). Earlier I mentioned that I use '\0' as the host-port delimiter, but using ':' would be better, as "host:port" syntax seem to be common, and not every implementation splits the address early as I do. So, are you for, against, or haven't made your mind yet? -- Tomash Brechko
