PEP 393

The Unicode string type is changed to support multiple internal
representations, depending on the character with the largest Unicode
ordinal (1, 2, or 4 bytes)

... Ah, OK. I get it. One byte representation is only ASCII, which happens
to match utf-8. Well, the latin-1 oddness. But the internal representation
is utf-16 or utf-32 if the string contains code points requiring multi-byte
representation.

On Sun, Oct 27, 2019, 12:19 AM Chris Angelico <ros...@gmail.com> wrote:

> On Sun, Oct 27, 2019 at 2:37 PM David Mertz <me...@gnosis.cx> wrote:
> > What does actual CPython do currently to find that s[1_000_000],
> assuming utf-8 internal representation?
> >
>
> Mu.
>
> CPython does not have a UTF-8 internal representation.
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/python-ideas@python.org/message/JZF35M3NBU42EH5Y37AAN4BCXQCZ63B2/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/UD6M2WXPOCAIPXOGWMWLYEFA77OZPUHH/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to