Re: [Python-ideas] Support WHATWG versions of legacy encodings

Random832 Thu, 11 Jan 2018 12:17:06 -0800

On Thu, Jan 11, 2018, at 14:55, Rob Speer wrote:
> There is one more difference I have found between Python's encodings and
> WHATWG's. In Python's codepage 1255, b'\xca' is undefined. In WHATWG's, it
> maps to U+05BA HEBREW POINT HOLAM HASER FOR VAV. I haven't tracked down
> what the Unicode Consortium has to say about this.


It appears in the best fit mapping (with a comment suggesting it unclear what 
vowel point it is actually meant to be) but not the normal mapping.

> Other than that, all the differences are adding the fall-throughs in the
> range U+0080 to U+009F. For example, elsewhere in windows-1255, the byte
> b'\xff' is undefined, and it remains undefined in WHATWG's mapping.

This is, for the record, also consistent with the results of my test program - 
0xCA is treated as a perfectly ordinary mapping that goes to U+05BA, whereas 
0xFF returns an error. In permissive mode it maps to U+F896.

0xCA U+05BA appears (with no glyph, though) in the code chart Microsoft 
published with https://www.microsoft.com/typography/unicode/cscp.htm, but not 
in the corresponding mapping list. It also does not appear in 
https://msdn.microsoft.com/en-us/library/cc195057.aspx.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Support WHATWG versions of legacy encodings

Reply via email to