Hi all

(This is all running django 1.1.1)

On our site, we have a lot of pages that take a long time to generate,
mainly because they make a lot of expensive SOAP-like calls to other
servers. Some of the pages take exceptionally long periods of time (>
30 seconds) to generate a full web page. In order to output these
responses reliably, we use iterator objects to output the content of
the page bit by bit, eg:

def iterator_resp(request):
  def worker():
    yield "Hello"
    yield " "
    yield "world"
  return HttpResponse(worker())

Since these pages take such a long time to generate, ideally we would
like to cache the generated content for the next time it is requested,
however it seems that any attempt to cache that view with the
@cache_page decorator is doomed to fail.

The first problem occurs when the UpdateCacheMiddleware runs it's
process_response() phase, it calls
django.utils.cache.patch_response_headers(), which consumes the
generator function trying to calculate an MD5 sum of the contents for
an ETag. This leads to the generator being exhausted when we come to
output the response, and we get an empty response.

This can be avoided by setting a manual ETag on the response, but in
this case python refuses to pickle the response object, since
generator objects are not picklable.

I figure that this sort of caching should definitely be possible, so I
wrote a small patch to UpdateCacheMiddleware to notice when we are
supplying an iterable response, and then create a duplicate
HttpResponse, with a proxy generator that reads from the original
response, writing each chunk to a buffer and yield'ing it also to the
response. Once the original response is drained, I create another
response with the buffered content, and this response can be put into
the cache.

Obviously, there are trade-offs here - if you want to cache a 50MB
page, you've got to be prepared to buffer 50MB - but is this the right
sort of approach to take? Is there a better way of achieving similar
results? I feel quite uncomfortable poking at the inside of
HttpResponse - looking at HttpResponse::_is_string in particular!

Cheers

Tom

--

You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-us...@googlegroups.com.
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en.


Index: cache.py
===================================================================
--- cache.py    (revision 147298)
+++ cache.py    (working copy)
@@ -91,7 +91,21 @@
         patch_response_headers(response, timeout)
         if timeout:
             cache_key = learn_cache_key(request, response, timeout, 
self.key_prefix)
-            cache.set(cache_key, response, timeout)
+            if response._is_string:
+              cache.set(cache_key, response, timeout)
+            else:
+              from django.http import HttpResponse
+              def worker(rsp):
+                from cStringIO import StringIO
+                buf = StringIO()
+                for chunk in rsp:
+                  buf.write(chunk)
+                  yield chunk
+                buffered_response = HttpResponse(buf.getvalue())
+                buf.close()
+                rsp.close()
+                cache.set(cache_key, buffered_response, timeout)
+              return HttpResponse(worker(response))
         return response
 
 class FetchFromCacheMiddleware(object):

Reply via email to