Serhiy Storchaka added the comment:
There is no the Snake emoji in my font, I use the Cat Face emoji U+1F431 π±
(\xf0\x9f\x90\xb1 in UTF-8, \x3d\xd8\x31\xdc in UTF-16LE).
Move cursor or press Backspace. I had needed to press Left 2 times to move
cursor to the begin of the line, press Right 4 times to move cursor back to the
end of line, and press Backspace 4 times to remove all stuff. What is called
"Tk doesn't support astral characters".
Get the text programmically.
>>> text.get('1.0', '1.end')
'Γ°οΎοΎοΎ±'
>>> print(ascii(text.get('1.0', '1.end')))
'\xf0\uff9f\uff90\uffb1'
On Linux the clipboard uses UTF-8, and this symbol is represented by 4-bytes
bytestring b'\xf0\x9f\x90\xb1' (that is why Tk sometimes interpret it as 4
characters). When you request the text content as a Unicode, Tcl fails to
decode the string from UTF-8 and falls back to Latin1. Due to other bug it
extends the sign of some bytes. When you programmically insert the same string
back, it will be encoded to b'\xc3\xb0\xef\xbe\x9f\xef\xbe\x90\xef\xbe\xb1' and
displayed as 'Γ°οΎοΎοΎ±'.
On Windows the clipboard uses UTF-16LE and you can see different results.
The underlying graphical system can support astral characters, but Tk fails to
handle them correctly.
----------
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue13153>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com