Well, this all depends on the cost of creating the data and how volatile it is. 
 If it doesn't cost much to create, there's little reason to cache it.  If its 
expensive, then there's good reason.  If the expensive parts are non-volatile, 
then there's a good reason to split it up.  If its all fairly static and all 
aspects needed, then you can get away with a single entry.  (A problem with a 
highly split data sets is determining the keys needed to do the multi-get, 
which can be resolved by also caching the key sets).

For serialization, you could maintain the SUID to resolve conflicts if you want 
to share data between versions.  If you the cache is large enough, you could 
just prefix the build # to your keys as Brian mentioned.  The problem with that 
approach is if you have a central server that will rewarm the remote cache on 
major change events and then send a refresh message so your application servers 
reload their caches (thus, only the notifier hits the database).  In that case, 
you'll have a lot of misses unless you transition to a distributed rewarming 
approach (e.g. distributed locks per cache+build#, so first to lock rewarms, 
while the rest wait and then only refresh).  We're probably one of the few 
places that have that type of scenario, since application behavior changes 
dramatically based on company policies and no one system may have the entire 
code base but rather run fine-grained services.

You should probably first look at how well your local caching is doing (if you 
have any), and then build out the remote layer as needed.  Often, the 
application code is just dumb and when properly written with local caches, you 
get a sizable performance boost.  Unless a special case, its usually best to 
not rely on the remote cache for application performance but rather to resolve 
database performance issues (e.g. high CPU utilization).

----- Original Message ----
From: marc2112 <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, December 13, 2007 8:43:33 AM
Subject: coarse-grained or fine-grained?


Hi All,

Working on designing a caching layer for my website and I wanted to get some 
opinions from memcached users.  There are two issues I'm hashing through:
1) Level of granularity to cache data at

2) Version compatibility across software releases


The primary applications that would be using the cache are developed in Java 
and utilize a smalish (~20 classes) domain object model.  In a few use-cases as 
you could imagine, we only need a few attributes from 2 or 3 different domain 
objects to service a request.


How granular is the data that folks are typically putting into memcached?  
Since there is support for batched gets, it would seem like one option at the 
farthest end of the spectrum would be to cache each attribute separately.  I 
could see there being a lot of overhead on puts in this case and it's probably 
not so efficient overall.  The other end of the spectrum would be to cache one 
object that references all of the other related data, often reading more data 
then we need to from the cache.


The last consideration I'm thinking through in all of this is how to manage 
serializable class versioning.  Do ppl generally take an optimistic approach 
here and if there is a serialization exception on read, just replace what's in 
the cache?  Or do you include a class version indicator as part of the key?  If 
it's part of the key, how do you make sure that there aren't two live versions 
with potentially different attribute values in the cache.


Thanks for your thoughts,
---Marc










      
____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  
http://tools.search.yahoo.com/newsearch/category.php?category=shopping

Reply via email to