Hey Gary, On Tue, 2008-01-08 at 00:35 -0600, Gary Wilson Jr. wrote: [...] > So, looking at a couple places in Django trunk where response.content is used, > these look like bugs: > > > django.contrib.csrf.middleware.CsrfMiddleware.process_response: > > def process_response(self, request, response): > ... > response.content = _POST_FORM_RE.sub(add_csrf_field, response.content) > ...
This isn't a bug, but it's subtle. There is only a problem if you are trying to substitute a Python unicode object into a bytestring. That's because Python tries to coerce the two elements into the same type (unicode in this case) and it uses the "ascii" codec by default. If you try to substitute a bytestring into a bytestring, no problems. The example you started the thread with was the former case: you were using a u'...' string as the first argument and a bytestring (request.content) as the second argument. The CSRF middleware is using bytestrings throughout, so it's safe. > django.test.testcases.TestCase.assertContains: > > def assertContains(self, response, text, count=None, status_code=200): > ... > real_count = response.content.count(text) > ... Yes, this is a semi-bug. The "correct" way to use it is non-obvious: you need to make sure 'text' is a bytestring -- so it's possible to use it correctly, but the obvious way is sometimes wrong, which makes it a bad API. When this popped up it previously it was because "text" was a unicode object and response.content wasn't, so Python tried to coerce the former to a Python unicode object and failed dismally. This is an argument in favour of adding a unicode_content attribute to HttpResponse. If you want to be a really good maintainer here and really give encoding in responses a workover, these are the things I would think about: - if somebody specifies a mimetype with a content encoding, we should use that for the encoding (not re-encode to UTF-8). - if the mimetype isn't something that can be sensibly re-encoded, don't. For example, image/jpeg shouldn't go through the re-encoding washing machine. The problem is that this is all very difficult to get correct without dozens and dozens of special cases. I suspect the right solution is that if a mimetype is specified and a bytestring is passed in, we should *never* re-encode the information. If a unicode object is passed in, we can encode it according to the charset specified (of self._charset otherwise). Think about it a bit and see if that makes sense to you. This is fairly brain-twisting stuff, but there should be a simple solution where we don't try to second-guess the user. Basically, I'd like images and other binary opaque data not to be accidentally munged by middleware (there's a ticket open about respecting the content-transfer header, for example, that's related to this, too). Cheers, Malcolm -- The early bird may get the worm, but the second mouse gets the cheese. http://www.pointy-stick.com/blog/ --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---