Martin v. Löwis added the comment:

The problem is that IDLE passes an UTF-8 encoded source string to compile, and 
compile, in the absence of a source encoding, uses the PEP 263 default source 
encoding, i.e. Latin-1.

As the consequence, the variable s has the value

u'\\xd0\\xa0\\xd1\\x83\\xd1\\x81\\xd1\\x81\\xd0\\xba\\xd0\\xb8\\xd0\\xb9 
\\xd1\\x82\\xd0\\xb5\\xd0\\xba\\xd1\\x81\\xd1\\x82'

IDLE's "Default Source Encoding" is irrelevant - it only applies to editor 
windows.

One solution for that is the attached patch. However, this patch isn't right, 
since it will cause all source to be interpreted as UTF-8. This would be wrong 
when the sys.stdin.encoding is not UTF-8, and byte string objects are created 
in interactive mode.

Interactive mode manages to get it right by looking up sys.stdin.encoding 
during compilation, but it does so only when in interactive mode (i.e. when 
tok->prompt != NULL.

I don't see any way to fix this problem in Python 2. It is fixed in Python 3, 
basically by always assuming that the source encoding is UTF-8, by making all 
string objects Unicode objects, and disallowing non-ASCII characters in bytes 
literals

----------
keywords: +patch
nosy: +loewis
Added file: http://bugs.python.org/file27045/compile_unicode.diff

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue15809>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to