There's a tension here: if you look at profiling data, command parsing actually turns out to be the single most expensive part of the code. That is still true after several rounds of optimization of the parser. So we certainly want a nice verbose extensible human-readable protocol, but we also need to move in the opposite direction: a protocol that can be parsed just about for free. That is almost certainly going to be a binary protocol (or at the very least a less freeform text protocol that what we have now.)
Obviously both can be supported. But it's something to be aware of: sending the server a bunch of commands that take even more time than the current protocol's to parse will probably have a much bigger than expected negative impact on memcached's CPU efficiency. That said, this is probably only an actual issue for a fraction of a percent of the sites that use memcached. On most small-scale sites with a moderate amount of traffic, it's rare to see memcached even show up in "top", and if you quadrupled the cost of command parsing, that would still likely be the case. So as long as there is a high-performance protocol, and there are clients available that speak it, I suspect none of us who run high-volume sites would object to there also being a more verbose human-readable one. As an aside, if you have a robust HTTP server running inside memcached, the temptation will be to use it for benchmarking, e.g., by running "ab" against a memcached instance. Of course memcached isn't going to be *slow* in such a benchmark, but you will be tying memcached's hands behind its back in some sense: you will be using the lowest performance interface available (and the interface is the single most expensive part of the code, so that's significant) and you will, most likely, be giving up stuff like multi-key "get" requests, which actually turn out to be a huge efficiency win. I bet it won't take long for the first "I thought memcached was supposed to be fast, but look at these mediocre numbers from ab!" blog post. One other thought: since, if you buy what I said above, HTTP is not going to be the choice of the performance-sensitive, maybe it makes sense to consider implementing the HTTP support as a completely separate frontend that acts as a proxy to a memcached instance that speaks the high-speed protocol. I am not sure that's actually a good idea but I thought I'd toss it out there. -Steve On 7/7/07 9:11 PM, "Paul Querna" <[EMAIL PROTECTED]> wrote: > Paul Querna wrote: >> Dustin Sallings wrote: >>> There'd be indexing overhead, but you could have an O(1) >>> invalidation if the tags themselves were versioned. >>> >>> Assuming the cache time is short or you're accessing these records, >>> cleanup should pretty much take care of itself. >>> >>> Protocol-wise, would it make sense to have the tags be additional >>> tokens on the mutation line? i.e.: >>> >>> <command name> <key> <flags> <exptime> <bytes> [<tag> [...]]\r\n >> >> >> Well, it does bring up a wider issue of protocol versioning.... >> >> I was thinking about a more generic structure, if we ever did a protocol >> revamp, something like: >> >> <command>\n >> <meta>=<string||int>\n >> data=<data>\m >> END >> >> So, for example a SET today would be like: >> SET >> key=foobar >> flags=400 >> bytes=1000 >> data=...data... > > And, thinking about that syntax a little more, we are just reinventing > HTTP, so, why not make memcached's protocol v2 just be HTTP :-) ? > > -Paul
