Re: memcached failover solution?

Adam Lee Mon, 19 Oct 2009 10:21:21 -0700

On Mon, Oct 19, 2009 at 6:00 AM, Henrik Schröder <skro...@gmail.com> wrote:


> How it works depends on which client you use. If you use the BeITMemcached
> client, when one instance goes down it will internally mark it as dead,
> start writing about it in the error log, and all requests that would end up
> at that server will be cache misses, which means that if you only have two
> servers, half of your data will not be cached. Your application should be
> able to handle this, and you should have something monitoring the servers
> and the error log so that you can see that one instance is down, and then
> you have the choice of either bringing the server back up again, or removing
> it from the configuration of you application and restarting it.
>
> If you have a client that supports automatic failover (BeITMemcached
> doesn't) then in the scenario above, all your data would be cached on the
> other instance instead, so you would still be fully cached while bringing
> the failing instance back up, or removing it from the configuration.
> However, you would still have to restart your application to reset the
> failover. This would be the best option, I'll try to add failover to the
> BeITMemcached client as soon as I have time for it. :-)
>
> The third case is a client that supports automatic failover, and automatic
> recovery from failover. It's similar to the scenario above, except it won't
> need an application restart when the failing memcached server comes back up,
> HOWEVER this means that your cache will not be synchronized while your
> application gradually discovers that the memcached server is back up.
> Depending on your application, this can be catastrophic, or it can be
> inconsequential. I don't really want to add this to my client, and if I do,
> there will be lots of warnings about it.
>

An even better to solution is to have the client automatically remove the
server on failure but to have server rediscovery be a manual and/or
synchronous operation.  You don't want a server flapping between up and down
states because of something like a faulty NIC and thus having your cache
become horribly inconsistent because none of the clients agree on its state,
so you only want it to go down if it's really down and you only want it to
come back up if it's really up and every client can agree (or all be told at
the same time) that it's back up.

We use memcached in a few different ways in our system and, therefore, have
a few different ways of implementing this on our side.  In one instance, we
use memcached as a deterministic cache where we want to guarantee that all
data is always available-- we do this by using our own custom client that
does client-side replication to every server in the pool.  It's a fairly
small dataset (a few gigs) and fits in memory on every instance, so we can
easily do this.  We ensure consistency, though, by writing a marker entry
that indicates when the cache was last populated.  Our client code never
writes this entry, so a new server will only ever be _actually_ added to the
pool when a full-populate is run manually or the nightly crontab job
executes.  This way, we know that we can add new servers to the pool and not
have to worry about them missing or having inconsistent data-- they won't be
read from until they're shown to have the proper dataset.

In the other instance, we use memcached in a more standard configuration.
 Here, we don't do any sort of client side replication, though we do use
ketama hashing (and a few other things that I won't get into like a hacked
up NodeLocator and cache miss fall-through to a middle tier persistent
cache)...  When a machine dies, we automatically take it out of the config.
 To add a new machine, though, we have to push out the config we want and
fire off an admin message (just a special class of message on our standard
message queue) indicating that a new, valid memcached config is present and
should be loaded.  This way, we can somewhat guarantee that configs stay
consistent across all instances.

-- 
awl

Re: memcached failover solution?

Reply via email to