Serhiy Storchaka added the comment:

Yes, it is still the same issue. The root of issue is in converting strings 
when passed to Python-implemented callbacks. When a text is pasted in IDLE 
window, the callback is called (for highlighting). The callback is a command 
created by Tcl_CreateCommand from PythonCmd. PythonCmd is a wrapper which 
converts arguments (char*) to Python strings and then pass them to Python 
command. Arguments are encoded in "modified UTF-8" [1], i.e. the NUL character 
is represented as \xc0\x80, they can contains other invalid UTF-8 sequences 
(such as encoded surrogates). When decoding arguments to Python strings are 
failed, main Tcl loop is broken and IDLE silently closed.

When astral character is pasted on Windows, it first encoded to UTF-16 by 
Windows, then Tcl encodes every 16-bit surrogate to modified UTF-8. The result 
is not valid UTF-8. On X Window systems the X selection value usually is UTF-8 
encoded (the type is UTF8_STRING), but can contains invalid UTF-8 sequences.

I will open separate issue to fix other bugs related to Tcl <-> Python string 
conversions. The last patch fixes only initial issue which is most important.

[1] http://en.wikipedia.org/wiki/UTF-8#Modified_UTF-8

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13153>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to