Re: Next Major Memcached release / roadmap?

Iain Wade Sat, 16 Jun 2007 13:04:26 -0700

The code is a bit hard to get at, so here is the link FYI:


http://svn.wikimedia.org/viewvc/mediawiki/trunk/tugelacache/

Correct me if I'm wrong, but my understanding is that Tugela's use of BDB
is blocking, killing the non-blocking event-based nature of the memcached
core.  If that's the case, how do you justify blocking in the event loop?


As I remember the memcached code, the libevent non-blocking stuff is
only on the user side of communications, and it magically doesn't
block because it is in-memory.

With tugela/BDB, when you ask to fetch a key it
single-threaded/synchronously fetches the data from it's store (fast
if in cache, slower if disk) into a freshly malloc'd buffer where it
is handed off to the same event based flushing code as in memcached.

In my view, the blocking nature of the item access could be directly
compared to accesing a memory page on a non-mlock'd or over-committed
memcache instance that would need to be paged in from swap.

I don't like making it a runtime option. A compile-time option, I could
maybe live with. But I don't want to have to install BDB headers and
libraries just to build memcached when I have no interest in BDB support,
which is presumably what runtime switchability would imply.


how about non-mandatory dependency = compile time constained run-time selection?

Also, dynamically selecting a backend at runtime seems to me to imply at
least a little performance hit, given that the backend storage isn't
particularly abstracted out at this point. Maybe not a huge enough hit to be
statistically significant, in which case fine. But I don't want to make my
memcached instances run slower for the sake of a feature I'm not using.


Sure, nobody wants to pay for something they don't use. Something to
test on the actual code before a hypothetical commit though.

I suspect, however, that making a BDB backend work really well would require
a more radical restructuring of memcached, e.g., to allow the communication
with the persistent storage to take place in another thread such that it
wouldn't block requests for data that are in a local write-through cache. Or
would you just let the BDB libraries handle all that? I don't know if they
have options for "return this data if you have it in memory already,
otherwise return rather than blocking for disk I/O."


Tugela is nothing clever. It takes the simple obvious approach. the
command-line cache size option just gets passed straight through to
BDB's caching layer.

Under no circumstances,
it seems to me, do you want to force clients whose requests can be serviced
from RAM to wait around for some other request to be serviced from disk.


Isn't this what would happen today with memcache if the machine is
swapping? (though I expect most users choose to mlock the memory to
avoid Linux's VM system doing silly things)

Seems utterly terrible for performance, and would never be merged.
Unless BDB has some async API that you're using?  I haven't looked.


ouch.

In my view:
a persistent cache has a use,
it is convenient for me to use the one API for both,
obviously the performance characteristics differ,
making it an option should have little impact on the common memory backed case.

--Iain

Re: Next Major Memcached release / roadmap?

Reply via email to