Ezio Melotti <ezio.melo...@gmail.com> added the comment:

> Ezio, do you know anything about these speculations?

Assuming that the non-BMP character is represented with two surrogates 
(\ud801\udca2) and that _tkinter tries to decode them independently, the error 
message ("invalid continuation byte") would be correct.

Python 2 UTF-8 codec is more permissive and allows encoding/decoding of 
surrogates (this might also explain why it works on Python 2): 
>>> u'\ud801'.encode('utf-8')
'\xed\xa0\x81'
>>> '\xed\xa0\x81'.decode('utf-8')
u'\ud801'

But on Python 3, trying to decode that results in an error:
>>> b'\xed\xa0\x81'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 0-1: invalid 
continuation byte

> But then the problem should be the initial byte, not the continuation
> bytes, which are the same for all chars and which all have 10 for
> their two high order bits.

While it's true that all continuation bytes have the first two bits equal to 
'10', the opposite is not always true.  Some start bytes have additional 
restrictions on the continuation bytes.  For example, even if the first two 
bits of 0xA0 (0b10100000) are '10', the valid continuation bytes for a sequence 
starting with 0xED are restricted to the range 80..9F.

The fact that
>>> '\U000104a2'
'š’¢'
works is because the input is all ASCII, so the decoding doesn't fail.


> [...]
> This should catch any miscellaneous crashes which are not otherwise
> caught and maybe turn the crash issues into bug reports -- the same
> way that running from the command line did.

Having some "safe net" to catch all the unhandled exceptions seems like a good 
idea.  This won't work in case of segfaults, but it's still better than 
nothing.  I'm not sure what you mean with "turn them into bug reports" though.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13153>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to