Re: Help with a Caching Strategy

2007-07-06 Thread Jeremy Dunck

On 7/6/07, Clint Ecker <[EMAIL PROTECTED]> wrote:
...
> When I set up caching I then just cache this_month_detail for 15
> minutes and previous_month_detail for 10 years or so.

What cache backend are you using?

Note that memcached has an LRU expiry policy, so that if there is
pressure on the cache, it'll toss out an un-accessed item to make room
for a new key.

So that 10 years might be significantly shorter.

...
> Do you mean by "baking the requests", caching the output to a file on
> disk/db or by pickling the request object (pretty sure you don't mean
> this).  If the first is what you mean, how does this differ from using
> file-based caching in Django?

The difference is that Django only supports one backend at a time, and
file-based caching breaks down if you're doing lots of writes (that
is, cache misses) or other disk IO that flushes the OS buffers.

It sounds like your data changes very rarely, and unless the page is
quite large, that number of items sounds reasonable (to me) to use
disk-based on your own.

(I'm assuming, here, that you're using memcache for your write-heavy stuff.)

...
> In another vein,  how does everyone deal with invalidating the cache

It's a hard problem, in that your needs are your needs, and the common
ground is generally pretty fine-grained.

There's a SoC project right now to do queryset-based caching, and
David Cramer hacked up a CachedManager to do similar.

With those approaches, the cache key is basically the queryset
parameters themselves, and the cache miss/regen branch I outlined is
hidden in the QuerySet implementation.  Cache invalidation isn't done,
AFAIK, in David's hack, but it is a goal of the SoC projec to void any
querysets that contain an updated model.

(I think the SoC project is using signals for the invalidation; I have
no idea how they're going to deal with model updates out of the cached
process.)

If you're looking for something coarser, because, for example, you
need to combine multiple querysets or outside data in a complex way,
then that's something that it unlikely to be a common need for Django.

The basic idea is, keep a key for the expensive thing itself, and then
keep a key for each unit-of-invalidation that is a dict of all final
keys which depend on the invalidated thing.

When the component becomes invalid, you look up all the keys that
depend on it, and go delete those.

I suppose you could recurse there-- checking to see if the things
depending on the invalid component are themselves components on which
other things depend.

I personally haven't needed this strategy yet.  :)

> can I trigger what I might call an
> automated rebuild of that template?

I just make HTTP requests for all the expensive views to keep the cache warm.

Just pace the rebuild-- it's easy to affect a site's performance when
specifically calling a bunch of expensive views.

HTH,
  Jeremy

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Help with a Caching Strategy

2007-07-06 Thread Clint Ecker

Thanks Jeremy,
   I'll try to expand a bit, and give some insight into a solution I
crafted this morning after reading your message (it flipped a switch
in my brain)

> How's the data updated?  Need to know how to get the update info to
> the cache.  :)

The data is updated by a crontab job that  runs a python script.  This
script pulls down RSS feeds, scrapes web sites, etc., mines out data
into my format, adds rows to the database using the Django ORM, etc.
every 15 minutes.

> As a hack, you could have a stub view that just decides if it's the
> current month or not, then dispatch either of 2 real views, each with
> its own cache_page.

This is how I implemented it this morning. I think there might a few
edge cases that aren't being handled but what I did is I send everyone
to:

month_detail(...)

which then decides whether or not they are requesting this month or a
previous month, and they're forwarded to the following views,
respectively:

this_month_detail(...) and previous_month_detail(...) which both do a
call to real_month_detail(...) which has all the shared code.

When I set up caching I then just cache this_month_detail for 15
minutes and previous_month_detail for 10 years or so.

> For an actual solution, more detail's needed.
>
> How many other parameters from the request come into play?

Pretty much every parameter that's taken into account comes through my
function arguments, (year, month, entity).  That is, the output of the
view depends wholly on the unique combinations of those 3 items.

> If a small number of permutations, you could bake all the data (that
> is, pull requests into a flat file to be served cheaply later, in
> which "invalidating cache" is deleting a file).

I'd say that there are (3 years)*(12 months)*(65 entities)
permutations (~2300) and that obviously grows every month and as we
add more entities, which happens at the rate of approximately 1-2 a
month.

Do you mean by "baking the requests", caching the output to a file on
disk/db or by pickling the request object (pretty sure you don't mean
this).  If the first is what you mean, how does this differ from using
file-based caching in Django?

> If you decide to use the low-level cache due to too many permutations,
> this is the general approach:
>
> expensive_thing = cache.get(some_key)
> if not expensive_thing:
> expensive_thing = expensive_process
>cache.set(some_key, expensive_thing, cache_timeout)
>
> You can, of course, do that as much as you want.

> I have some views that do two or 3 phases, in which I cache a whole
> resultset, then munge or whittle it depending on parameters and cache
> that bit with a more fine-grained key.

I may look into this a bit more to target my intensive bits of that
particular view.


In another vein,  how does everyone deal with invalidating the cache
and the resulting penalty the next client to request that view
receives? These old views can take near 30 seconds to regenerate from
a non-cached state. Say for whatever reason, a month from 5 months ago
recieves a new bit of data and I kill the cache for it.  How do you
regenerate that view?

Is there a way to do it programatically from within Django?

i.e. when I add a new bit of data to an old month in my cron script
and invalidate that month's cache, can I trigger what I might call an
automated rebuild of that template?

I would prefer that the penalty take place during my cron'd script's
execution rather than the user have a perceived delay the next time
that particular view is requested.

Also, thanks a bunch, Jeremy!
Clint

-- 
Clint Ecker
Sr. Web Developer - Stone Ward Chicago
p: 312.464.1443
c: 312.863.9323
---
twitter: clint
skype: clintology
AIM: idiosyncrasyFG
Gtalk: [EMAIL PROTECTED]

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---



Re: Help with a Caching Strategy

2007-07-05 Thread Jeremy Dunck

On 7/5/07, Clint Ecker <[EMAIL PROTECTED]> wrote:
> It would be super cool to invalidate the cache (or not) at the
> moment I update the data, but it's not mission critical.

How's the data updated?  Need to know how to get the update info to
the cache.  :)

> Long story short, my current approaches haven't yielded any fruit. I'm
> not sure that I can cache one view two different ways by using the
> cache_page function.  Perhaps I need to dig a little deeper into the
> caching mechanisms?

As a hack, you could have a stub view that just decides if it's the
current month or not, then dispatch either of 2 real views, each with
its own cache_page.

For an actual solution, more detail's needed.

How many other parameters from the request come into play?

If a small number of permutations, you could bake all the data (that
is, pull requests into a flat file to be served cheaply later, in
which "invalidating cache" is deleting a file).

If you decide to use the low-level cache due to too many permutations,
this is the general approach:

expensive_thing = cache.get(some_key)
if not expensive_thing:
expensive_thing = expensive_process
   cache.set(some_key, expensive_thing, cache_timeout)

You can, of course, do that as much as you want.

I have some views that do two or 3 phases, in which I cache a whole
resultset, then munge or whittle it depending on parameters and cache
that bit with a more fine-grained key.

Cheers,
  Jeremy

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~--~~~~--~~--~--~---