[issue3995] iso-xxx/cp1252 inconsistencies in Python 2.* not in 3.*

STINNER Victor Mon, 29 Sep 2008 03:14:44 -0700

STINNER Victor <[EMAIL PROTECTED]> added the comment:

If you write "€" in the Python interpreter (Python2), you will get a 
*bytes* string encoded in your terminal charset. Example on Linux 
(utf-8):


Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> '€'
'\xe2\x82\xac'

Use "u" prefix to get unicode string:

Python 2.5.1 (r251:54863, Jul 31 2008, 23:17:40)
>>> u'€'
u'\u20ac'

If you use unicode, encoding to ISO-8859-1/-15 works correctly. 
(Truncated) example with python trunk:

Python 2.6rc2+ (trunk:66680M, Sep 29 2008, 12:03:32)
>>> u'€'.encode('ISO-8859-1')
...
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u20ac'
>>> u'€'.encode('ISO-8859-15')
'\xa4'

In a script (Python code written in a file), use #coding header to 
specify your file charset. Or use "\xXX", "\uXXXX" and "\UXXXX" 
notations for non-ASCII characters.

Is there somewhere an Unicode Python FAQ? :-)

----------
nosy: +haypo

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3995>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3995] iso-xxx/cp1252 inconsistencies in Python 2.* not in 3.*

Reply via email to