Malcolm Tredinnick wrote: > On Mon, 2008-01-07 at 18:28 -0600, Gary Wilson Jr. wrote: >> Malcolm Tredinnick wrote: >>> On Sun, 2008-01-06 at 15:25 -0600, Gary Wilson Jr. wrote: >>>> It appears that at this point, response.content is a utf8-encoded >>>> bytestring. >>>> I'm playing with a response middleware doing something like: >>>> >>>> MY_RE.sub(u'%s</body>' % text, response.content) >>>> >>>> which raises a UnicodeDecodeError if response.content contains non-ascii. >>>> >>>> I understand that the strings need to be of the same type, but was >>>> wondering >>>> if response.content needs to be returned as a utf8-encoded bytestring or if >>>> it's ok to convert it to unicode and return that. Does it matter? >>> It must be UTF-8 (or, at least, a bytestring). Some encoding to be in >>> force, since "unicode" isn't a character encoding and response.content >>> is the last station before we send stuff back to the web server. >> So to make sure I've got this right, would either of the two examples below >> be >> sufficient? >> >> content = MY_RE.sub(u'%s</body>' % text, force_unicode(response.content)) >> content = content.encode('utf-8') > > Not quite sure what you're doing with "content" here, since a response > middleware modifies the response directly. Since you can happily set > "content" with a unicode object, you should just be able to do > > request.content = ....
Sorry, I was using "content" just for the sake of brevity. >> content = MY_RE.sub((u'%s</body>' % text).encode('utf-8'), response.content) > > In both cases, for absolutely bullet-proofness, you could use > response._charset as the encoding (rather than assuming it's the default > of UTF-8). Obviously depends on circumstances, but if this is for > something in Django's core, for example, it needs to be flexible. Every > now and again, somebody is going to change the DEFAULT_CHARSET value. Ah, thank you. I was wrongly assuming this would always be utf-8 here. > (There is, by the way, a subtle semi-bug hidden in there: if you pass in > a mime type, including an encoding, we still (re-)encode the data, which > is a little naughty. It's difficult to work out all the cases when we > should and shouldn't, though. Again, lots of "we could do..." > possibilities, but each one has trade-offs. That's a way-out-there > edge-case, though.) > >>> I realise this is slightly inconvenient for middleware classes, but >>> since we cannot tell ahead of time if any middleware classes are going >>> to be invoked, we have to treat response.content specially. >> Could the handler not do the final encoding as the last thing it does on the >> response's way out (so after any middleware has been processed)? > > Naturally, anything is possible, but I don't like the design. > HttpResponse returns a valid HTTP response via it's __str__ method and > valid HTTP data via the "content" attribute. That's a nicely > encapsulated design. Let's resist messing with it and keep the > responsibility in the right place. > > If you really want to avoid the whole extra dozen characters of typing > now and again, let's add an unicode_content property to HttpResponse. That was my second thought :) > Regards, > Malcolm Thanks for clearing things up Malcolm. So, looking at a couple places in Django trunk where response.content is used, these look like bugs: django.contrib.csrf.middleware.CsrfMiddleware.process_response: def process_response(self, request, response): ... response.content = _POST_FORM_RE.sub(add_csrf_field, response.content) ... django.test.testcases.TestCase.assertContains: def assertContains(self, response, text, count=None, status_code=200): ... real_count = response.content.count(text) ... Would you agree? Looks like someone has hit the second one before: http://groups.google.com/group/django-users/browse_thread/thread/b3d7a40423e011c3/311206ff6601a376 Gary --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---