Curiosity killed the `stats cachedump`

dormando Sun, 31 Jul 2011 12:03:15 -0700

Yo,

We've threatened to kill the `stats cachedump` command for probably five
years. I've daydreamed about randomizing the command name on every minor
release, every git push, ensuring that it stays around as a last ditch
debugging tool.


A lot of you continue to build programs which rely on stats cachedump.
This both confuses and enrages us. Removing it outright sounds like a
failure, though. Your malevolent overlords have decided that this thing
you want and occasionally use should be taken away.

So instead I'd like to start a discussion which I'll seed with some
ideas; we want to shitcan this feature, but it should be a fair trade. If
we shitcan it, we first need to make you not want it anymore.

Here are some ideas I have for making you not want this feature anymore:

- Better documentation.

95% of the time when users want to use cachedump, they want to verify that
their application is working right. There're better ways to do this, but
it's clearly too hard to figure out.

- Better toolage.

That 95% of users overlaps with users who want to know better about what's
going on inside memcached. Our usual response is "restart in screen with
-vvv or point to a logfile or blah blah blah". This is unacceptable.
mk-query-digest helps, and I will hopefully be releasing a tool to do the
same for the binary protocol. This should allow you to watch or summarize
the flow of data, which is much more useful anyway.

- Streaming commands.

Instead of (or as well as) running tcpdump tools, we could add commands
(or simply use TAP? I'm not sure if it overlaps fully for this) which lets
you either telnet in and start streaming some subset of information, or
run tools which act like varnishlog. Tools that can show the command,
the return value, and also the hidden headers.

An off the cuff example:

Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
watch every=5000,request=full,response=headers

The above would stream back one out of every 5000 requests, with the full
request, and the headers of the response, but not the full binary data.
I'm not promising to implement this as-is, but I could see it helping to
solve the issue.

Astute readers will notice that this is my biased push on the TOPKEYS
feature; 1.6 already has a way to discover the most accessed keys, but I
feel strongly that its approach is too limited.

- Commands to poll the Head or Tail of the LRU

Probably the most controversial. It is much more efficient to pretend that
the head or the tail are nebulous, nefarious, malicious things. As
instances grow into the tens of millions of items, polling at the head or
the tail doesn't give you a consistent view of very much. I imagine this
would be immediately abused by people implementing queues (or perhaps
that's a good thing?)

It also weighs heavy in my mind as we reserve the right to make the LRU
more loose or more strict as we evolve. It may not exist at all at some
point.

- Commands to stream the keys of evictions, or also reclaims or expired
items

People want cachedump so they can see what's still in there. This would be
an extension (or instead of) the previous streaming commands. You would
register for events with a set of flags, and when items expire or are
evicted or whatever you decided to watch, it would copy a result to the
stream.

It is much, much more efficient to read out of the statistical counters to
get the information. But as people want to see what's in there, often
they're really wondering about what's no longer in there.

---

I'm not really sold on any of these. These are not all the ideas we should
even consider, if you have better ones. Please help distribute this ML
post around as much as possible so we can have a better chance of having
an intelligent discussion about it.

Thanks,
-Dormando

Curiosity killed the `stats cachedump`

Reply via email to