What to cache ? (was Re: Caching needs to be implemented in Rails application)

Shanti Subramanyam Fri, 23 Jan 2009 08:57:20 -0800

Thanks Will. At this point, the PHP app is only caching the home page.We are wondering whether to even do the Event Detail page as the load onthe database has been drastically cut down just from the home page caching.It's a dilemma - if we cache too much, there is no load on the db. If wecache too little, there is nothing much in memcached. Of course, if werun a much larger scale (say, 10's of systems for the web tier), thenI'm sure we'll see increasing load on both tiers. But practicallyspeaking, we need to be able to run a reasonable configuration.

Akara has another idea to use memcached more heavily, while at the sametime not reducing the db load. Namely, cache the thumbnails in it. Thiswill also reduce the load on the filestore (which currently is quiteheavily stressed for the PHP app). But this strategy won't work for therails app will it ? I believe you're serving all static files out ofthe proxy server ?


Would love to hear what others think as well.

Shanti

William Sobel wrote:

On Jan 22, 2009, at 5:11 PM, Shanti Subramanyam wrote:
Can you please elaborate on what exactly is cached ? How is the cachemanaged (in terms of timeouts etc.) ?
From the original writeup:

Cache Strategy for Web20Kit

Home Page

The home page will be cached in two forms:
1. Cached as a whole page accessed by users arriving at the site andusers that are not logged on.2. Cached as a page fragment, just for the content part. The page willbe constructed from the dynamic header which contains the user name ofthe current user and the cached content fragment.3. Paginations – these will be cached up to 5 pages. It is less likelyfor users to search for events beyond the fifth page.
Expiration and re-generation
The home page will expire every 120 seconds. Then the page will bere-generated by one of the first requests arriving after theexpiration. To prevent all requests arriving after the expiration fromre-generating, thus causing a stampede phenomenon, we will use alock/semaphore control mechanism as follows:
1. The home page and/or home page fragment is cached with no timeoutor a very large timeout (in the order of magnitude of days) in memcached.2. For each cached page, a small semaphore object is placed intomemcached with a timeout of 120 seconds – the regeneration cycle.3. After accessing the page/fragment in the cache and sending theresponse to the user, the cache client (web server) checks to seewhether the semaphore is there or has timed out. If it is not there(timed out), the client will attempt to re-generate the page or fragment.4. To prevent a stampede, the client ‘adds’ a lock entry into thecache. If the add succeeds, this thread has the lock. The lock timesout after 20 seconds using the memcached timeout mechanism. Thisprevents a thread to hold a lock indefinitely.5. After obtaining the lock, the thread generates the page or fragmentand replaces the copy in memcached.6. Then the generating thread places a new semaphore object with thesame timeout period and removes the lock object.
Event Detail Page
The event detail page is cached as both content and, if not logged on,the whole page as well.
Expiration and re-generation
Event detail page cache entries have a time out of 30 seconds usingthe cache timeout mechanism of memcached. Thus only frequentlyaccessed events will remain in the cache. The load generator will needto be designed to access event detail pages in a non-uniform manner,too. We will use a locking mechanism for the event detail page in asimilar manner to the home page. However, we will not use an expirysemaphore and let the page expire from the cache as a whole. Access tothe entry should however renew the expiry time so that frequentlyaccessed events will stay in cache. The mechanism will work as follows:
1. The event detail page and fragment is cached with a timeout of 30seconds.2. As a cache client needs to access the entry, it will try to readthe entry from the cache. If the entry is available, it will extendthe cache timeout. Otherwise, the event detail page is generated fromthe database.3. To regenerate the page and prevent stampede, the client ‘adds’ alock entry into the cache. If the add succeeds, this thread has thelock. The lock times out after 20 seconds using the memcached timeoutmechanism. This prevents a thread to hold a lock inidefinitely.4. After obtaining the lock, the thread proceeds with generating thepage. After completion, the page gets placed into the cache and thelock gets removed from memcached.5. If we do not get the lock (add fails). We stay in a loop, sleep for200ms, and check/re-check whether the page matches. We keep checkingtill a timeout of 5 seconds (25 iterations).6. The attendee list and comments/rating fragments of this page iscached in the same manner. Those sections will be re-generated whileholding a lock object in the same manner. They will be regenerated ifthe fragment is not in the cache, and on or after updating of thosefragments (i.e. somebody makes a comment or signed up to attend thisevent).
Other Pages
At this point, none of the other pages and/or their fragments arecached. Most of the other pages are accessed at low frequency with theexception of the tag search page. The tag search page is the nextcandidate for caching and pre-generation. The caching strategy isstill to be determined.
Page Caches with Ruby on Rails
Ruby on Rails does not natively use memcached for whole page caches.It can do so with caching page fragments. Instead, it will generatestatic pages as files and the request will be routed to thecorresponding file that represents a fully rendered page.
The Ruby on Rails implementation of Web20Kit will use the native Railsmechanism for full page caches. Expirations result in a call to removethe file and follow the same expiry policy defined for each page,above. The file must be removed as the page cache expires, either by arequest arriving after expiry, or by a background job.
Cheers,
- Will Sobel

What to cache ? (was Re: Caching needs to be implemented in Rails application)

Reply via email to