Eryk Sun <[email protected]> added the comment:
> cp65001 is *not* utf-8: Microsoft decided to handle surrogates
> differently for some reasons.
Do you mean valid UTF-16 surrogate pairs? For example:
>>> codecs.code_page_encode(65001, '\ud800\udc00')
(b'\xf0\x90\x80\x80', 2)
PyUnicode_AsUnicodeAndSize is neutral about storing surrogate codes in a 16-bit
wchar_t string. In particular, the Python string in this case contains two
surrogate codes, but they're passed to WideCharToMultiByte as a UTF-16
surrogate pair for the single character U+10000.
Anyway, it seems to me this issue will be resolved if cp65001.py is rewritten
without functools.partial.
----------
nosy: +eryksun
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue36778>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com