I completely agree with this, that UTF-8 has become the One True Encoding(tm), and UCS-2 and UTF-16 are hardly found anywhere outside of the Win32 API. Nearly all basic emoji can't be represented in UCS-2 wchar_t, let alone composite emoji.
So how to make that C-compatible? Make everything a void* and it just comes back with as many bytes as it gets? On Tue, Jun 30, 2020 at 5:22 AM Richard Damon <[email protected]> wrote: > On 6/30/20 7:53 AM, M.-A. Lemburg wrote: > > Since the C world has adopted wchar_t for this purpose, it's the > > natural choice. > > I would disagree with this comment. Microsoft Windows has chosen to use > 'wchar_t' for Unicode, because they adopted UCS-2 before it morphed into > UTF-16 due to the expansion of Unicode above 16 bits. The *nix side of > the world has chosen to use UTF-8 as the preferred way to store Unicode > characters. > > Also, in Windows, wchar_t doesn't really meet the requirements for what > C defines wchar_t to mean, as wchar_t is supposed to represent every > character as a single unit, and thus would need to be at least a 21 bit > type (typically, it would be a 32 bit type), but Windows makes it a 16 > bit type due to ABIs being locked before the Unicode expansion. > > -- > Richard Damon > _______________________________________________ > Python-Dev mailing list -- [email protected] > To unsubscribe send an email to [email protected] > https://mail.python.org/mailman3/lists/python-dev.python.org/ > Message archived at > https://mail.python.org/archives/list/[email protected]/message/TA2ITVZY6ZGH2Y42JAXD243RSG7MONTV/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-Dev mailing list -- [email protected] To unsubscribe send an email to [email protected] https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/[email protected]/message/KHFCEVSMTF6LIJAKHCAKTYAYWU6JEBNB/ Code of Conduct: http://python.org/psf/codeofconduct/
