On Jun 18, 2008, at 14:29, Joyesh Mishra wrote:
1 function get_foo (int userid) {
2 result = memcached_fetch("userrow:" + userid);
3 if (!result) {
4 result = db_select("SELECT * FROM users WHERE userid = ?",
userid);
5 memcached_add("userrow:" + userid, result);
6 }
7 return result;
8}
9 function update_foo(int userid, string dbUpdateString) {
10 result = db_execute(dbUpdateString);
11 if (result) {
12 data =
createUserDataFromDBString(dbUpdateString);
13 memcached_set("userrow:" + userid, data);
14 }
15}
*******
Imagine a table now getting queried on 2 columns say userid and
username
Q1:
If we have 100 processes each executing the get_foo function, and
lets say
memcached does not have the key. As there would be a delay between
executing Line 2 and Line 5,
there would be atleast dozens of processes querying the db and
executing Line 5 creating more
bottleneck on the memcached server - How does it scale then (Imagine
a million processes now getting triggered)?
I understand it is the initial load factorbut how do you take this
into account while starting up the memcached servers?
The bottleneck isn't on the memcache server, it's on your DB. In
that case, sounds like you've got a really popular user. :)
You may have a bit of a thundering herd problem. If it's too
intense, you can create (or find) a locking mechanism to prevent the
thundering herd from thundering too hard.
Q2:
Now imagine, you have 100 processes again querying the key out of
which 50 execute get_foo() and 50 update_foo().
And lets say the key is not there on memcached server. Imagine T1
doing a select operation
followed
by T2 doing an update. T1 is in Line4 doing the select and *GOING*
to add the key to cache, while T2
goes ahead and updates the DB and executes Line 13 (i.e. updates the
cache). Now if T1 executes Line 5
it would have stale results (in such a case memcache_add fails
basically - but is it a sufficient guarantee
that such a case would never arise?)
I don't know what API you're using, but memcached's add fails if a
value is already in the cache for the specified key.
Q3:
Now we have 2 queries say:
select * from users where userid = abc;
select * from users where username = xyz;
Users
|userid|username|userinfo|
and I want memcached to improve the query performance
I had 2 approaches:
1. Cache1: Key=userid Value=User_Object
Cache2: Key=username Value=userid
2. Cache1: Key=userid Value=User_Object
Cache2: Key=username Value=User_Object
Do you see potential flaws in any of these approaches? I tried to
trace the flaws in the first one using
various db calls, still would ask if you guys
have seen it before.
If you're really concerned about stale objects here, you can use
CAS. For most of these issues, `get || add' combinations give you a
reasonable level of atomicity. Most of the time, however, it really
doesn't matter.
I would like to know in detail how memcached server handles queueing
of these requests and atomicity of requests. If there are any posts/
info on it, please let me know.
There's no real queue other than connection management threads
huddled around the storage mutex. At the point where memcached says
you've written, it's done.
--
Dustin Sallings