When we were replacing a shared memory cache with memcached I had to come up with a way to keep the database from getting hammered when a frequently requested value expires from the cache. This is my story:
Our system caches the frequently requested key for an unreasonably long time (24hours). It also caches a "semaphore" key for the amount of time that I want the main key's data to be refreshed (5 minutes). When a process notices that the main key or the semaphore key no longer exist in the cache, it checks to see if a third key exists -- this is the "updatelock" key. If this updatelock key does not exist, the process will set the key. The hostname and process id are stored in the value. This key is stored with a small timeout in case the process hangs or dies after staking a claim to the updatelock key. After attempting to set the updatelock key, the process reads the updatelock cached key to make sure it corresponds to its hostname and process id. If these do not match, someone else is responsible for performing the update. Once this is satisfied, the process performs the database lookup. The process then sets the main key, sets a new semaphore key, and deletes the updatelock key. This scheme makes sure that only one of the processes is trying to hit the database for this particulary heavily use key. If the main key doesn't exist at all, but the updatelock is set by another process, I just have the process sleep for a second and try again. This scenario should only happen on a dump of the memcached pool, or if that frequently requested key goes dormant for more than 24 hours. We don't use this for our lower traffic keys, and only use this scheme if the key will be accessed across an entire site on most requests. my $0.02, --bradoaks On 7/25/07, Chris Hondl <[EMAIL PROTECTED]> wrote:
We've hit this scenario several times over the past year on different queries - typically slow queries (5-10 seconds) on frequently accessed web pages. We usually cache these for 6 hours or longer. We handle this with a probabilistic timeout implemented in our application. We store a timestamp and expiration time serialized as part of the value set to memcache. The last minute or so before the key expires we randomly run the query and set the value to memcache with a new expiry. The probability ramps up from 0% to 100% over the last minute or so. In two places we use a different algorithm. We set no expiry on the value, but on every get we have a 1 in N chance of refreshing the value. We tune N so that the value gets refreshed approximately once every 5 minutes. Annoying part about this is as our traffic grows we have to change N to keep the expected interval constant. Chris -- http://avatars.imvu.com/chris
