On Thu, 11 Jan 2018 at 11:43 Random832 <random...@fastmail.com> wrote:

> Maybe we need a new error handler that maps unassigned bytes in the range
> 0x80-0x9f to a single character in the range U+0080-U+009F. Do any of the
> encodings being discussed have behavior other than the "normal" version of
> the encoding plus what I just described?
>

(accidentally replied individually instead of replaying all)

There is one more difference I have found between Python's encodings and
WHATWG's. In Python's codepage 1255, b'\xca' is undefined. In WHATWG's, it
maps to U+05BA HEBREW POINT HOLAM HASER FOR VAV. I haven't tracked down
what the Unicode Consortium has to say about this.

Other than that, all the differences are adding the fall-throughs in the
range U+0080 to U+009F. For example, elsewhere in windows-1255, the byte
b'\xff' is undefined, and it remains undefined in WHATWG's mapping.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to