Re: Moving key hashing into the client

Steven Grimm Sat, 12 Jan 2008 10:54:07 -0800

On Jan 12, 2008, at 6:24 AM, Tomash Brechko wrote:

I mean the server computes some hash value to decide where the key
belongs in server's hash table.  I.e. it is how the server finds the
key in its memory.

If the client passes in the same hash that it uses for serverselection, won't that lead to nonuniform distribution of keys to hashbuckets? Ideally you want a completely different hash function forserver selection than for item bucketing within the server. A simpleexample:

Say you have two memcached hosts. The client computes a hash value anddoes a simple "modulo by number of hosts" to pick one of them. Then itsends its request along with the hash value.

The server does a simple "modulo by number of buckets" to figure outwhere to put the value. If the number of buckets is a power of two,half of them are guaranteed to always be unused in this scheme,because, e.g., only keys whose hash values have the low bit set willbe sent to server #2, whereas only hash values whose low bit is clearwill be sent to server #1. So you end up either wasting memory onempty buckets or, more likely, not even noticing that the linked listsin your buckets are twice as long as you'd like just because you'rebasing the "do I auto-expand the hashtable?" decision on number ofitems / number of buckets.

This is less of an issue if the number of servers is a prime number,or if you change memcached to always use a prime number of buckets,but I think using the same hash function for server selection and forinternal server data structures is likely to lead to weird, hard todiagnose inefficiencies.

The bigger question: Are you actually seeing a server that'sbottlenecked on the cost of computing the key hash? In our tests,hashing was so cheap that I didn't even bother moving the hashcomputation outside the mutex-protected part of the code. It islightning fast because the entire key tends to fit in the CPU's L1cache, so you pay basically nothing to scan it and do the simple hashcomputation memcached uses internally.

If you're seeing lots of lock contention in threaded mode and hashinglooks like a culprit, I'd try moving the hash computation up into thenon-lock-protected code before I'd go messing with the protocol andbreaking compatibility with all existing clients just to put thatminiscule piece of work on the client's plate.


-Steve

Re: Moving key hashing into the client

Reply via email to