Re: Help with a Caching Strategy
On 7/6/07, Clint Ecker <[EMAIL PROTECTED]> wrote: ... > When I set up caching I then just cache this_month_detail for 15 > minutes and previous_month_detail for 10 years or so. What cache backend are you using? Note that memcached has an LRU expiry policy, so that if there is pressure on the cache, it'll toss out an un-accessed item to make room for a new key. So that 10 years might be significantly shorter. ... > Do you mean by "baking the requests", caching the output to a file on > disk/db or by pickling the request object (pretty sure you don't mean > this). If the first is what you mean, how does this differ from using > file-based caching in Django? The difference is that Django only supports one backend at a time, and file-based caching breaks down if you're doing lots of writes (that is, cache misses) or other disk IO that flushes the OS buffers. It sounds like your data changes very rarely, and unless the page is quite large, that number of items sounds reasonable (to me) to use disk-based on your own. (I'm assuming, here, that you're using memcache for your write-heavy stuff.) ... > In another vein, how does everyone deal with invalidating the cache It's a hard problem, in that your needs are your needs, and the common ground is generally pretty fine-grained. There's a SoC project right now to do queryset-based caching, and David Cramer hacked up a CachedManager to do similar. With those approaches, the cache key is basically the queryset parameters themselves, and the cache miss/regen branch I outlined is hidden in the QuerySet implementation. Cache invalidation isn't done, AFAIK, in David's hack, but it is a goal of the SoC projec to void any querysets that contain an updated model. (I think the SoC project is using signals for the invalidation; I have no idea how they're going to deal with model updates out of the cached process.) If you're looking for something coarser, because, for example, you need to combine multiple querysets or outside data in a complex way, then that's something that it unlikely to be a common need for Django. The basic idea is, keep a key for the expensive thing itself, and then keep a key for each unit-of-invalidation that is a dict of all final keys which depend on the invalidated thing. When the component becomes invalid, you look up all the keys that depend on it, and go delete those. I suppose you could recurse there-- checking to see if the things depending on the invalid component are themselves components on which other things depend. I personally haven't needed this strategy yet. :) > can I trigger what I might call an > automated rebuild of that template? I just make HTTP requests for all the expensive views to keep the cache warm. Just pace the rebuild-- it's easy to affect a site's performance when specifically calling a bunch of expensive views. HTH, Jeremy --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Help with a Caching Strategy
Thanks Jeremy, I'll try to expand a bit, and give some insight into a solution I crafted this morning after reading your message (it flipped a switch in my brain) > How's the data updated? Need to know how to get the update info to > the cache. :) The data is updated by a crontab job that runs a python script. This script pulls down RSS feeds, scrapes web sites, etc., mines out data into my format, adds rows to the database using the Django ORM, etc. every 15 minutes. > As a hack, you could have a stub view that just decides if it's the > current month or not, then dispatch either of 2 real views, each with > its own cache_page. This is how I implemented it this morning. I think there might a few edge cases that aren't being handled but what I did is I send everyone to: month_detail(...) which then decides whether or not they are requesting this month or a previous month, and they're forwarded to the following views, respectively: this_month_detail(...) and previous_month_detail(...) which both do a call to real_month_detail(...) which has all the shared code. When I set up caching I then just cache this_month_detail for 15 minutes and previous_month_detail for 10 years or so. > For an actual solution, more detail's needed. > > How many other parameters from the request come into play? Pretty much every parameter that's taken into account comes through my function arguments, (year, month, entity). That is, the output of the view depends wholly on the unique combinations of those 3 items. > If a small number of permutations, you could bake all the data (that > is, pull requests into a flat file to be served cheaply later, in > which "invalidating cache" is deleting a file). I'd say that there are (3 years)*(12 months)*(65 entities) permutations (~2300) and that obviously grows every month and as we add more entities, which happens at the rate of approximately 1-2 a month. Do you mean by "baking the requests", caching the output to a file on disk/db or by pickling the request object (pretty sure you don't mean this). If the first is what you mean, how does this differ from using file-based caching in Django? > If you decide to use the low-level cache due to too many permutations, > this is the general approach: > > expensive_thing = cache.get(some_key) > if not expensive_thing: > expensive_thing = expensive_process >cache.set(some_key, expensive_thing, cache_timeout) > > You can, of course, do that as much as you want. > I have some views that do two or 3 phases, in which I cache a whole > resultset, then munge or whittle it depending on parameters and cache > that bit with a more fine-grained key. I may look into this a bit more to target my intensive bits of that particular view. In another vein, how does everyone deal with invalidating the cache and the resulting penalty the next client to request that view receives? These old views can take near 30 seconds to regenerate from a non-cached state. Say for whatever reason, a month from 5 months ago recieves a new bit of data and I kill the cache for it. How do you regenerate that view? Is there a way to do it programatically from within Django? i.e. when I add a new bit of data to an old month in my cron script and invalidate that month's cache, can I trigger what I might call an automated rebuild of that template? I would prefer that the penalty take place during my cron'd script's execution rather than the user have a perceived delay the next time that particular view is requested. Also, thanks a bunch, Jeremy! Clint -- Clint Ecker Sr. Web Developer - Stone Ward Chicago p: 312.464.1443 c: 312.863.9323 --- twitter: clint skype: clintology AIM: idiosyncrasyFG Gtalk: [EMAIL PROTECTED] --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---
Re: Help with a Caching Strategy
On 7/5/07, Clint Ecker <[EMAIL PROTECTED]> wrote: > It would be super cool to invalidate the cache (or not) at the > moment I update the data, but it's not mission critical. How's the data updated? Need to know how to get the update info to the cache. :) > Long story short, my current approaches haven't yielded any fruit. I'm > not sure that I can cache one view two different ways by using the > cache_page function. Perhaps I need to dig a little deeper into the > caching mechanisms? As a hack, you could have a stub view that just decides if it's the current month or not, then dispatch either of 2 real views, each with its own cache_page. For an actual solution, more detail's needed. How many other parameters from the request come into play? If a small number of permutations, you could bake all the data (that is, pull requests into a flat file to be served cheaply later, in which "invalidating cache" is deleting a file). If you decide to use the low-level cache due to too many permutations, this is the general approach: expensive_thing = cache.get(some_key) if not expensive_thing: expensive_thing = expensive_process cache.set(some_key, expensive_thing, cache_timeout) You can, of course, do that as much as you want. I have some views that do two or 3 phases, in which I cache a whole resultset, then munge or whittle it depending on parameters and cache that bit with a more fine-grained key. Cheers, Jeremy --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~--~~~~--~~--~--~---