On 30/06/2020 13:43, Emily Bowman wrote:
I completely agree with this, that UTF-8 has become the One True Encoding(tm), and UCS-2 and UTF-16 are hardly found anywhere outside of the Win32 API. Nearly all basic emoji can't be represented in UCS-2 wchar_t, let alone composite emoji.
You say that as if it's a bad thing :-)
So how to make that C-compatible? Make everything a void* and it just comes back with as many bytes as it gets?
I'd be inclined to something like that. You really don't want people trying to roll their own UTF-8 handling if you can help it. That does imply the C API will need to be pretty comprehensive, though.
(If you want nightmares, take a look at the parsing code in Expat. Multiple layers of macros and function tables make it a horror to comprehend.)
-- Rhodri James *-* Kynesim Ltd _______________________________________________ Python-Dev mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/[email protected]/message/7HPGNVZ46ROP3HMRUJXJXX2WI4LI4JAL/ Code of Conduct: http://python.org/psf/codeofconduct/
