*--- FROM THAT ARTICLE ---* *Alexey describes a different two key approach:*
Create two keys in memcached: MAIN key with expiration time a bit higher than normal + a STALE key which expires earlier. On a get read STALE key too. If the stale has expired, re-calculate and set the stale key again. ------------------------------------------------------ Can anyone explain a bit more about how this combats the dogpile effect? It seems to me that this just means the STALE key will cause the dogpile if all of the visitors are getting the STALE key also and the STALE key is, in effect, determining the actual re-calculation timing. It seems to me the description above is missing something. -Stephen On Wed, Jun 18, 2008 at 10:02 PM, Ryan LeCompte <[EMAIL PROTECTED]> wrote: > For some approaches on how to avoid the "dog pile effect" on your > database, take a look at: > > http://highscalability.com/strategy-break-memcache-dog-pile > > Ryan > > > On Wed, Jun 18, 2008 at 8:43 PM, Dustin Sallings <[EMAIL PROTECTED]> wrote: > > > > On Jun 18, 2008, at 14:29, Joyesh Mishra wrote: > > > > 1 function get_foo (int userid) { > > 2 result = memcached_fetch("userrow:" + userid); > > 3 if (!result) { > > 4 result = db_select("SELECT * FROM users WHERE userid = ?", > userid); > > 5 memcached_add("userrow:" + userid, result); > > 6 } > > 7 return result; > > 8} > > > > 9 function update_foo(int userid, string dbUpdateString) { > > 10 result = db_execute(dbUpdateString); > > 11 if (result) { > > 12 data = > > createUserDataFromDBString(dbUpdateString); > > 13 memcached_set("userrow:" + userid, data); > > 14 } > > 15} > > > > ******* > > > > Imagine a table now getting queried on 2 columns say userid and username > > > > Q1: > > If we have 100 processes each executing the get_foo function, and lets > say > > memcached does not have the key. As there would be a delay between > executing > > Line 2 and Line 5, > > there would be atleast dozens of processes querying the db and executing > > Line 5 creating more > > bottleneck on the memcached server - How does it scale then (Imagine a > > million processes now getting triggered)? > > I understand it is the initial load factorbut how do you take this into > > account while starting up the memcached servers? > > > > The bottleneck isn't on the memcache server, it's on your DB. In that > case, > > sounds like you've got a really popular user. :) > > > > You may have a bit of a thundering herd problem. If it's too intense, > you > > can create (or find) a locking mechanism to prevent the thundering herd > from > > thundering too hard. > > > > Q2: > > Now imagine, you have 100 processes again querying the key out of which > 50 > > execute get_foo() and 50 update_foo(). > > And lets say the key is not there on memcached server. Imagine T1 doing a > > select operation > > followed > > by T2 doing an update. T1 is in Line4 doing the select and *GOING* to add > > the key to cache, while T2 > > goes ahead and updates the DB and executes Line 13 (i.e. updates the > cache). > > Now if T1 executes Line 5 > > it would have stale results (in such a case memcache_add fails basically > - > > but is it a sufficient guarantee > > that such a case would never arise?) > > > > I don't know what API you're using, but memcached's add fails if a value > is > > already in the cache for the specified key. > > > > Q3: > > Now we have 2 queries say: > > select * from users where userid = abc; > > select * from users where username = xyz; > > > > Users > > |userid|username|userinfo| > > > > and I want memcached to improve the query performance > > > > I had 2 approaches: > > 1. Cache1: Key=userid Value=User_Object > > Cache2: Key=username Value=userid > > > > 2. Cache1: Key=userid Value=User_Object > > Cache2: Key=username Value=User_Object > > > > Do you see potential flaws in any of these approaches? I tried to trace > the > > flaws in the first one using > > various db calls, still would ask if you guys > > have seen it before. > > > > If you're really concerned about stale objects here, you can use CAS. > For > > most of these issues, `get || add' combinations give you a reasonable > level > > of atomicity. Most of the time, however, it really doesn't matter. > > > > I would like to know in detail how memcached server handles queueing of > > these requests and atomicity of requests. If there are any posts/info on > it, > > please let me know. > > > > There's no real queue other than connection management threads huddled > > around the storage mutex. At the point where memcached says you've > written, > > it's done. > > > > -- > > Dustin Sallings > > >
