Kenner,

If any machine is too CPU-bound to return memcached responses at anything other than ethernet speed you should probably step back and really evaluate your hardware plan.

You'll get a lot of mileage out of memcached but you'll be better off in this case rethinking your approach to caching -- if you really need transactional integrity or absolutely assured consistency you'll want to use an in-memory database like MySQL cluster. If you rework your application's cache layer to "appreciate not require" consistency you'll be much happier. Lots of folks on this list successfully use memcached as a source of authority but it's generally a) backed by a more expensive db query if necessary or b) not actually critical data. Whatever you do, make the kind of consistency problem you describe cause at most annoyance "grr, have to hit the disk and recalculate x y z", not "OMG two users got the same UID"

For very common data which must be _available_ we keep a separate pool of a couple servers who all get the same data written to them -- we've written MultiputMemcacheDriver class which handles that logic. If you write a timestamp as part of your payload data you can resolve ambiguity in a pinch -- data with the later timestamp is 'more authoritative'. It's not terribly complex but makes for better sleep.

-Nathan / PBwiki


On May 18, 2007, at 9:59 AM, Kenner Stross wrote:

Hello,

I am using the PECL php extension for memcached access, and am confused/concerned about data integrity in the case of a failure. I have already found some discussions on this list regarding this issue, but I don't see how those solutions hold up in a multi- server environment.

What I've found so far is basically this: Disable automatic failover, use a callback method to catch the failure and in that callback routine set the server status to off and stop any further retrying (-1), and lastly, implement an external service monitor that can detect the problem, flush the cache and then mark the server as available again. That way, you can be sure all stale entries are flushed before it rejoins the pool of active servers.

Fine for one client accessing the cache server. But I don't see how that guarantees integrity in a multi-client environment. In particular, I don't see how it works when the failure is quite temporary, due to a heavy load that made the response too sluggish. Hopefully I'm just overlooking the obvious and one of you will straighten me out.

Let's imagine a simple 3 machine setup (m1 - m3), where each machine is acting as a web server and a memcached server.

m1 web --> attempts write to m3 cache, but it fails due to extreme load. Marks it as failed and offline (in the callback routine). m2 web --> accesses m3 cache successfully (no load problem on m2, so no failure). Doesn't see that m1 took it offline.

m2 is using invalid cache data (it's missing m1's activity) but doesn't realize it. An external service monitor may or may not notice this brief, intermittent problem, but even if it does, that doesn't help m2 avoid the m3 cache once m1 has experienced an m3 cache failure.

I'm sure I must be missing something. Your help is greatly appreciated.

Thanks,
Kenner

Reply via email to