On Thu, 11 Jan 2018 at 11:43 Random832 <random...@fastmail.com> wrote:
> Maybe we need a new error handler that maps unassigned bytes in the range > 0x80-0x9f to a single character in the range U+0080-U+009F. Do any of the > encodings being discussed have behavior other than the "normal" version of > the encoding plus what I just described? > (accidentally replied individually instead of replaying all) There is one more difference I have found between Python's encodings and WHATWG's. In Python's codepage 1255, b'\xca' is undefined. In WHATWG's, it maps to U+05BA HEBREW POINT HOLAM HASER FOR VAV. I haven't tracked down what the Unicode Consortium has to say about this. Other than that, all the differences are adding the fall-throughs in the range U+0080 to U+009F. For example, elsewhere in windows-1255, the byte b'\xff' is undefined, and it remains undefined in WHATWG's mapping.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/