On Thu, Jan 11, 2018, at 14:55, Rob Speer wrote: > There is one more difference I have found between Python's encodings and > WHATWG's. In Python's codepage 1255, b'\xca' is undefined. In WHATWG's, it > maps to U+05BA HEBREW POINT HOLAM HASER FOR VAV. I haven't tracked down > what the Unicode Consortium has to say about this.
It appears in the best fit mapping (with a comment suggesting it unclear what vowel point it is actually meant to be) but not the normal mapping. > Other than that, all the differences are adding the fall-throughs in the > range U+0080 to U+009F. For example, elsewhere in windows-1255, the byte > b'\xff' is undefined, and it remains undefined in WHATWG's mapping. This is, for the record, also consistent with the results of my test program - 0xCA is treated as a perfectly ordinary mapping that goes to U+05BA, whereas 0xFF returns an error. In permissive mode it maps to U+F896. 0xCA U+05BA appears (with no glyph, though) in the code chart Microsoft published with https://www.microsoft.com/typography/unicode/cscp.htm, but not in the corresponding mapping list. It also does not appear in https://msdn.microsoft.com/en-us/library/cc195057.aspx. _______________________________________________ Python-ideas mailing list [email protected] https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
