For some approaches on how to avoid the "dog pile effect" on your database, take a look at:
http://highscalability.com/strategy-break-memcache-dog-pile Ryan On Wed, Jun 18, 2008 at 8:43 PM, Dustin Sallings <[EMAIL PROTECTED]> wrote: > > On Jun 18, 2008, at 14:29, Joyesh Mishra wrote: > > 1 function get_foo (int userid) { > 2 result = memcached_fetch("userrow:" + userid); > 3 if (!result) { > 4 result = db_select("SELECT * FROM users WHERE userid = ?", userid); > 5 memcached_add("userrow:" + userid, result); > 6 } > 7 return result; > 8} > > 9 function update_foo(int userid, string dbUpdateString) { > 10 result = db_execute(dbUpdateString); > 11 if (result) { > 12 data = > createUserDataFromDBString(dbUpdateString); > 13 memcached_set("userrow:" + userid, data); > 14 } > 15} > > ******* > > Imagine a table now getting queried on 2 columns say userid and username > > Q1: > If we have 100 processes each executing the get_foo function, and lets say > memcached does not have the key. As there would be a delay between executing > Line 2 and Line 5, > there would be atleast dozens of processes querying the db and executing > Line 5 creating more > bottleneck on the memcached server - How does it scale then (Imagine a > million processes now getting triggered)? > I understand it is the initial load factorbut how do you take this into > account while starting up the memcached servers? > > The bottleneck isn't on the memcache server, it's on your DB. In that case, > sounds like you've got a really popular user. :) > > You may have a bit of a thundering herd problem. If it's too intense, you > can create (or find) a locking mechanism to prevent the thundering herd from > thundering too hard. > > Q2: > Now imagine, you have 100 processes again querying the key out of which 50 > execute get_foo() and 50 update_foo(). > And lets say the key is not there on memcached server. Imagine T1 doing a > select operation > followed > by T2 doing an update. T1 is in Line4 doing the select and *GOING* to add > the key to cache, while T2 > goes ahead and updates the DB and executes Line 13 (i.e. updates the cache). > Now if T1 executes Line 5 > it would have stale results (in such a case memcache_add fails basically - > but is it a sufficient guarantee > that such a case would never arise?) > > I don't know what API you're using, but memcached's add fails if a value is > already in the cache for the specified key. > > Q3: > Now we have 2 queries say: > select * from users where userid = abc; > select * from users where username = xyz; > > Users > |userid|username|userinfo| > > and I want memcached to improve the query performance > > I had 2 approaches: > 1. Cache1: Key=userid Value=User_Object > Cache2: Key=username Value=userid > > 2. Cache1: Key=userid Value=User_Object > Cache2: Key=username Value=User_Object > > Do you see potential flaws in any of these approaches? I tried to trace the > flaws in the first one using > various db calls, still would ask if you guys > have seen it before. > > If you're really concerned about stale objects here, you can use CAS. For > most of these issues, `get || add' combinations give you a reasonable level > of atomicity. Most of the time, however, it really doesn't matter. > > I would like to know in detail how memcached server handles queueing of > these requests and atomicity of requests. If there are any posts/info on it, > please let me know. > > There's no real queue other than connection management threads huddled > around the storage mutex. At the point where memcached says you've written, > it's done. > > -- > Dustin Sallings >
