On Sat, Sep 19, 2009 at 11:19 AM, Joshua Russo <josh.r.ru...@gmail.com>wrote:
> ... in fact using utf-8 string literals can cause problems in other places >> with code that assumes another encoding (e.g. ascii) for byte strings. >> > > Could you expand on this? I know that the Unicode string object has > different methods than standard String, but are there other scenarios where > a unicode literal could cause problems? > > I did not say using a unicode literal can cause problems, I said using a (byte) string literal with utf-8 (really any non-ASCII) encoding can lead to problems. The Python interpreter itself will always (assuming no special site installation customization) attempt to use the ascii codec to convert from string to unicode, so using non-ASCII byte strings will often result in unicode decode errors if Python is ever relied on to do automatic coercion to unicode. For example this program: #! /usr/bin/python #-*- encoding: utf-8 -*- u = u'¿Chapter?' s = '¿Chapter?' msg1 = u"I've been handed: %s" % u msg2 = u"I've been handed: %s" % s will result in: File "./t.py", line 8, in <module> msg2 = u"I've been handed: %s" % s UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128) The unicode literal is handled properly, the bytestring literal causes a problem. Karen --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to django-users+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---