Terry J. Reedy added the comment: Byte 0, not byte 1, is the start byte, and it should be F0, as in output below. However, I now see "invalid continuation byte'. In 2.7.5, # -*- coding: utf-8 -*- s = b'𐒢' # output same if uncomment following lines #s = u'𐒢'.encode('utf-8') # '𐒢' pasted in from 1st post #s = u'\U000104a2'.encode('utf-8') print(len(s)) for c in s: print(ord(c), hex(ord(c))) >>> 4 (240, '0xf0') (144, '0x90') (146, '0x92') (162, '0xa2')
I have no idea how the second pasted byte becomes ED in 3.x. Attempting to open the file in 3.x results in a broken* 'Untitled' edit window and the following error message in the console. _tkinter.TclError: character U+104a2 is above the range (U+0000-U+FFFF) allowed by Tcl * Attempting to close the window either immediately or after entering text results in AttributeError: 'PyShellEditorWindow' object has no attribute 'extensions' I have to close the initial python process to get rid of it. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue13153> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com