On Jun 18, 2008, at 14:29, Joyesh Mishra wrote:
1 function get_foo (int userid) {
2    result = memcached_fetch("userrow:" + userid);
3    if (!result) {
4 result = db_select("SELECT * FROM users WHERE userid = ?", userid);
5        memcached_add("userrow:" + userid,  result);
6    }
7    return result;
8}

9 function update_foo(int userid, string dbUpdateString) {
10    result = db_execute(dbUpdateString);
11    if (result) {
12        data =
 createUserDataFromDBString(dbUpdateString);
13        memcached_set("userrow:" + userid, data);
14    }
15}

*******

Imagine a table now getting queried on 2 columns say userid and username

Q1:
If we have 100 processes each executing the get_foo function, and lets say memcached does not have the key. As there would be a delay between executing Line 2 and Line 5, there would be atleast dozens of processes querying the db and executing Line 5 creating more bottleneck on the memcached server - How does it scale then (Imagine a million processes now getting triggered)? I understand it is the initial load factorbut how do you take this into account while starting up the memcached servers?
The bottleneck isn't on the memcache server, it's on your DB. In that case, sounds like you've got a really popular user. :)

You may have a bit of a thundering herd problem. If it's too intense, you can create (or find) a locking mechanism to prevent the thundering herd from thundering too hard.
Q2:
Now imagine, you have 100 processes again querying the key out of which 50 execute get_foo() and 50 update_foo(). And lets say the key is not there on memcached server. Imagine T1 doing a select operation
 followed
by T2 doing an update. T1 is in Line4 doing the select and *GOING* to add the key to cache, while T2 goes ahead and updates the DB and executes Line 13 (i.e. updates the cache). Now if T1 executes Line 5 it would have stale results (in such a case memcache_add fails basically - but is it a sufficient guarantee
that such a case would never arise?)
I don't know what API you're using, but memcached's add fails if a value is already in the cache for the specified key.
Q3:
Now we have 2 queries say:
select * from users where userid = abc;
select * from users where username = xyz;

Users
|userid|username|userinfo|

and I want memcached to improve the query performance

I had 2 approaches:
1. Cache1: Key=userid Value=User_Object
   Cache2: Key=username Value=userid

2. Cache1: Key=userid Value=User_Object
   Cache2: Key=username Value=User_Object

Do you see potential flaws in any of these approaches? I tried to trace the flaws in the first one using
various db calls, still would ask if you guys
 have seen it before.
If you're really concerned about stale objects here, you can use CAS. For most of these issues, `get || add' combinations give you a reasonable level of atomicity. Most of the time, however, it really doesn't matter.
I would like to know in detail how memcached server handles queueing of these requests and atomicity of requests. If there are any posts/ info on it, please let me know.
There's no real queue other than connection management threads huddled around the storage mutex. At the point where memcached says you've written, it's done.

--
Dustin Sallings

Reply via email to