On Jan 22, 2009, at 5:11 PM, Shanti Subramanyam wrote:

Can you please elaborate on what exactly is cached ? How is the cache managed (in terms of timeouts etc.) ?

From the original writeup:

Cache Strategy for Web20Kit

Home Page

The home page will be cached in two forms:

1. Cached as a whole page accessed by users arriving at the site and users that are not logged on. 2. Cached as a page fragment, just for the content part. The page will be constructed from the dynamic header which contains the user name of the current user and the cached content fragment. 3. Paginations – these will be cached up to 5 pages. It is less likely for users to search for events beyond the fifth page.

Expiration and re-generation

The home page will expire every 120 seconds. Then the page will be re- generated by one of the first requests arriving after the expiration. To prevent all requests arriving after the expiration from re- generating, thus causing a stampede phenomenon, we will use a lock/ semaphore control mechanism as follows:

1. The home page and/or home page fragment is cached with no timeout or a very large timeout (in the order of magnitude of days) in memcached. 2. For each cached page, a small semaphore object is placed into memcached with a timeout of 120 seconds – the regeneration cycle. 3. After accessing the page/fragment in the cache and sending the response to the user, the cache client (web server) checks to see whether the semaphore is there or has timed out. If it is not there (timed out), the client will attempt to re-generate the page or fragment. 4. To prevent a stampede, the client ‘adds’ a lock entry into the cache. If the add succeeds, this thread has the lock. The lock times out after 20 seconds using the memcached timeout mechanism. This prevents a thread to hold a lock indefinitely. 5. After obtaining the lock, the thread generates the page or fragment and replaces the copy in memcached. 6. Then the generating thread places a new semaphore object with the same timeout period and removes the lock object.

Event Detail Page

The event detail page is cached as both content and, if not logged on, the whole page as well.

Expiration and re-generation

Event detail page cache entries have a time out of 30 seconds using the cache timeout mechanism of memcached. Thus only frequently accessed events will remain in the cache. The load generator will need to be designed to access event detail pages in a non-uniform manner, too. We will use a locking mechanism for the event detail page in a similar manner to the home page. However, we will not use an expiry semaphore and let the page expire from the cache as a whole. Access to the entry should however renew the expiry time so that frequently accessed events will stay in cache. The mechanism will work as follows:

1. The event detail page and fragment is cached with a timeout of 30 seconds. 2. As a cache client needs to access the entry, it will try to read the entry from the cache. If the entry is available, it will extend the cache timeout. Otherwise, the event detail page is generated from the database. 3. To regenerate the page and prevent stampede, the client ‘adds’ a lock entry into the cache. If the add succeeds, this thread has the lock. The lock times out after 20 seconds using the memcached timeout mechanism. This prevents a thread to hold a lock inidefinitely. 4. After obtaining the lock, the thread proceeds with generating the page. After completion, the page gets placed into the cache and the lock gets removed from memcached. 5. If we do not get the lock (add fails). We stay in a loop, sleep for 200ms, and check/re-check whether the page matches. We keep checking till a timeout of 5 seconds (25 iterations). 6. The attendee list and comments/rating fragments of this page is cached in the same manner. Those sections will be re-generated while holding a lock object in the same manner. They will be regenerated if the fragment is not in the cache, and on or after updating of those fragments (i.e. somebody makes a comment or signed up to attend this event).

Other Pages

At this point, none of the other pages and/or their fragments are cached. Most of the other pages are accessed at low frequency with the exception of the tag search page. The tag search page is the next candidate for caching and pre-generation. The caching strategy is still to be determined.

Page Caches with Ruby on Rails

Ruby on Rails does not natively use memcached for whole page caches. It can do so with caching page fragments. Instead, it will generate static pages as files and the request will be routed to the corresponding file that represents a fully rendered page.

The Ruby on Rails implementation of Web20Kit will use the native Rails mechanism for full page caches. Expirations result in a call to remove the file and follow the same expiry policy defined for each page, above. The file must be removed as the page cache expires, either by a request arriving after expiry, or by a background job.


Cheers,
- Will Sobel

Reply via email to