Should I store offline calculation results in the cache?

Antonis Christofides Sat, 27 May 2017 02:25:49 -0700

Hello all,

I have an application that calculates and tells you whether a specific crop at a
specific piece of land needs to be irrigated, and how much. The calculation
lasts for a few seconds, so I'm doing it offline with Celery. Every two hours
new meteorological data comes in and all the pieces of land are recalculated.

The question is where to store the results of the calculation. I thought that
since they are re-creatable, the cache would be the appropriate place. However,
there is a difference with the more common use of the cache: they are
re-creatable, but they are also necessary. You can't just go and delete any item
in the cache. This will cripple the website, which expects to find the
calculation results in the cache. Viewing something on the site will never
trigger a recalculation (and if I make it trigger, it will be a safety procedure
for edge cases and not the normal way of doing things). The results must also
survive reboots, so I chose the file-based cache.

I didn't know about culling, so when the pieces of land grew to 100, and the
items in the cache to 400 (4 items need to be stored for each piece of land), I
spent a few hours trying to find out what the heck is going on. I solved the
problem by tweaking the culling parameters. However all this has raised a few
issues:

1. The filesystem cache can't grow too much because of issue 11260
<https://code.djangoproject.com/ticket/11260>, which is marked wontfix.
According to Russell Keith-Magee
<https://code.djangoproject.com/ticket/11260#comment:7>,

"the filesystem cache is intended as an easy way to test caching, not as
a serious caching strategy. The default cache size and the cull strategy
implemented by the file cache should make that obvious. If you need a
cache capable of holding 100000 items, I strongly recommend you look at
memcache. If you insist on using the filesystem as a cache, it isn't
hard to subclass and extend the existing cache."

If these comments are correct, then the documentation needs some fixing,
because not only does in not say that the filesystem cache is not for
serious use, but it implies the opposite:

"Without a really compelling reason, ... you should stick to the cache
backends included with Django. They’ve been well-tested and are easy to
use."

Is Russell not entirely correct perhaps, or is the documentation? Or am I
missing something?

2. In the end, is it a bad idea to use the cache for this particular case? I
also have a similar use case in an unrelated app: a page that needs about a
minute to render. Although I've implemented a quick-and-dirty solution of
increasing the web server's timeout and caching the page, I guess the
correct way would be to produce that page offline with Celery or so. Where
would I store such a page if not in the cache?

--
Antonis Christofides
http://djangodeployment.com

--
You received this message because you are subscribed to the Google Groups
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-users+unsubscr...@googlegroups.com.
To post to this group, send email to django-users@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-users/a5a8d1ab-f4e0-a6b5-b1da-acc9dc2dbf9d%40djangodeployment.com.
For more options, visit https://groups.google.com/d/optout.

Should I store offline calculation results in the cache?

Reply via email to