On Sat, Sep 19, 2009 at 11:19 AM, Joshua Russo <josh.r.ru...@gmail.com>wrote:

> ... in fact using utf-8 string literals can cause problems in other places
>> with code that assumes another encoding (e.g. ascii) for byte strings.
>>
>
> Could you expand on this? I know that the Unicode string object has
> different methods than standard String, but are there other scenarios where
> a unicode literal could cause problems?
>
>
I did not say using a unicode literal can cause problems, I said using a
(byte) string literal with utf-8 (really any non-ASCII) encoding can lead to
problems.  The Python interpreter itself will always (assuming no special
site installation customization) attempt to use the ascii codec to convert
from string to unicode, so using non-ASCII byte strings will often result in
unicode decode errors if Python is ever relied on to do automatic coercion
to unicode.  For example this program:

#! /usr/bin/python
#-*- encoding: utf-8 -*-

u = u'¿Chapter?'
s = '¿Chapter?'

msg1 = u"I've been handed: %s" % u
msg2 = u"I've been handed: %s" % s

will result in:

  File "./t.py", line 8, in <module>
    msg2 = u"I've been handed: %s" % s
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0:
ordinal not in range(128)

The unicode literal is handled properly, the bytestring literal causes a
problem.

Karen

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to 
django-users+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to