PEP 393 The Unicode string type is changed to support multiple internal representations, depending on the character with the largest Unicode ordinal (1, 2, or 4 bytes)
... Ah, OK. I get it. One byte representation is only ASCII, which happens to match utf-8. Well, the latin-1 oddness. But the internal representation is utf-16 or utf-32 if the string contains code points requiring multi-byte representation. On Sun, Oct 27, 2019, 12:19 AM Chris Angelico <ros...@gmail.com> wrote: > On Sun, Oct 27, 2019 at 2:37 PM David Mertz <me...@gnosis.cx> wrote: > > What does actual CPython do currently to find that s[1_000_000], > assuming utf-8 internal representation? > > > > Mu. > > CPython does not have a UTF-8 internal representation. > > ChrisA > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/JZF35M3NBU42EH5Y37AAN4BCXQCZ63B2/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UD6M2WXPOCAIPXOGWMWLYEFA77OZPUHH/ Code of Conduct: http://python.org/psf/codeofconduct/