Malcolm Tredinnick wrote:
> On Mon, 2008-01-07 at 18:28 -0600, Gary Wilson Jr. wrote:
>> Malcolm Tredinnick wrote:
>>> On Sun, 2008-01-06 at 15:25 -0600, Gary Wilson Jr. wrote:
>>>> It appears that at this point, response.content is a utf8-encoded 
>>>> bytestring.
>>>> I'm playing with a response middleware doing something like:
>>>>
>>>> MY_RE.sub(u'%s</body>' % text, response.content)
>>>>
>>>> which raises a UnicodeDecodeError if response.content contains non-ascii.
>>>>
>>>> I understand that the strings need to be of the same type, but was 
>>>> wondering
>>>> if response.content needs to be returned as a utf8-encoded bytestring or if
>>>> it's ok to convert it to unicode and return that.  Does it matter?
>>> It must be UTF-8 (or, at least, a bytestring). Some encoding to be in
>>> force, since "unicode" isn't a character encoding and response.content
>>> is the last station before we send stuff back to the web server.
>> So to make sure I've got this right, would either of the two examples below 
>> be
>> sufficient?
>>
>> content = MY_RE.sub(u'%s</body>' % text, force_unicode(response.content))
>> content = content.encode('utf-8')
> 
> Not quite sure what you're doing with "content" here, since a response
> middleware modifies the response directly. Since you can happily set
> "content" with a unicode object, you should just be able to do
> 
>         request.content = ....

Sorry, I was using "content" just for the sake of brevity.

>> content = MY_RE.sub((u'%s</body>' % text).encode('utf-8'), response.content)
> 
> In both cases, for absolutely bullet-proofness, you could use
> response._charset as the encoding (rather than assuming it's the default
> of UTF-8). Obviously depends on circumstances, but if this is for
> something in Django's core, for example, it needs to be flexible. Every
> now and again, somebody is going to change the DEFAULT_CHARSET value.

Ah, thank you.  I was wrongly assuming this would always be utf-8 here.

> (There is, by the way, a subtle semi-bug hidden in there: if you pass in
> a mime type, including an encoding, we still (re-)encode the data, which
> is a little naughty. It's difficult to work out all the cases when we
> should and shouldn't, though. Again, lots of "we could do..."
> possibilities, but each one has trade-offs. That's a way-out-there
> edge-case, though.)
> 
>>> I realise this is slightly inconvenient for middleware classes, but
>>> since we cannot tell ahead of time if any middleware classes are going
>>> to be invoked, we have to treat response.content specially.
>> Could the handler not do the final encoding as the last thing it does on the
>> response's way out (so after any middleware has been processed)?
> 
> Naturally, anything is possible, but I don't like the design.
> HttpResponse returns a valid HTTP response via it's __str__ method and
> valid HTTP data via the "content" attribute. That's a nicely
> encapsulated design. Let's resist messing with it and keep the
> responsibility in the right place.
> 
> If you really want to avoid the whole extra dozen characters of typing
> now and again, let's add an unicode_content property to HttpResponse.

That was my second thought :)

> Regards,
> Malcolm

Thanks for clearing things up Malcolm.

So, looking at a couple places in Django trunk where response.content is used,
these look like bugs:


django.contrib.csrf.middleware.CsrfMiddleware.process_response:

def process_response(self, request, response):
    ...
    response.content = _POST_FORM_RE.sub(add_csrf_field, response.content)
    ...


django.test.testcases.TestCase.assertContains:

def assertContains(self, response, text, count=None, status_code=200):
    ...
    real_count = response.content.count(text)
    ...


Would you agree?

Looks like someone has hit the second one before:
http://groups.google.com/group/django-users/browse_thread/thread/b3d7a40423e011c3/311206ff6601a376

Gary

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to