Greetings all, I'll begin by introductions: I am a computer engineering undergrad, and our data structures class final project required us to modify some open source project's data structures. I thought it seemed a bit imposing to walk into a project and declare I knew better than their design decisions, so I instead looked for some new feature to add, and I decided to add two-dimensional keys to memcached. I don't see a huge demand for this but a) it might be useful for someone and b) it was sort of fun.
Here is a patch: http://c1f.net/misc/memcached-r651-2dhash.patch . As the filename implies, it applies against r651, which is what I pulled down when I began work. I'll say this right out: I was in a rush to get this out the door before the semester ends, so there are a few rough edges. I don't expect this to be accepted into mainline without some additional work (and even then maybe only as an option for the end-user). However, it's a starting point for whomever may wish to run with it. Here's the user-visible portion: all keys containing a period are split at the first period into the first and second dimensions. Inserts, deletes, updates, etc work as before, except that they also update a secondary index (more on that below). A 'get' command on dim1.* or *.dim2 will return (respectively) a list of all second-dimensions of keys that have that first dimension, or a list of all first-dimensions of keys that contain that second dimension. The 'get' datablock returned consists of zero or more of the following: <key-length (ASCII)> <space> <key> <CR> <LF>. One flaw remains: I could not (due to lack of time) integrate into the LRU-expiration (lazy deletion?) mechanism. An explicit delete will update the index appropriately but (for example) a flush-all will not. Here's how my data structure works: I keep two index hashtables (separate from the main hashtable with the items themselves), keyed by the first and second dimensions respectively. These hashtables share nodes, so a node has two linked-list ptrs, two keys, and is always in both tables. The hash chains are doubly-linked so that a node can be removed from both hash chains when it's found by either one. Searches by either dimension simply iterate over the appropriate hash-chain, returning all keys of the other dimension for nodes that match the search key. Interestingly, deletes of 2d-keyed items must iterate along the hashchain for the primary key to find the item with the given secondary key, so a delete is O(N2) average-case (where N2 is the average size of the second dimension). I justify this by assuming that the secondary dimension will be small compared to the primary. I couldn't think of any easy mechanism to solve this, short of keeping a hashtable of secondary keys per primary key, which is very heavy on memory usage. (If anyone has any input on this, I'd be curious, even post-final report for class :-) I hope someone finds this useful! If not, thanks at least for obliging me (and thanks for a project that's pretty much the purest application of data structures I could imagine). Chris Fallin
