Re: string storage [was: Re: imaplib: is this really so unwieldy?]

Jon Ribbens via Python-list Wed, 26 May 2021 09:01:49 -0700

On 2021-05-26, Alan Gauld <[email protected]> wrote:
> On 25/05/2021 23:23, Terry Reedy wrote:
>> In CPython's Flexible String Representation all characters in a string 
>> are stored with the same number of bytes, depending on the largest 
>> codepoint.
>
> I'm learning lots of new things in this thread!
>
> Does that mean that if I give Python a UTF8 string that is mostly single
> byte characters but contains one 4-byte character that Python will store
> the string as all 4-byte characters?
>
> If so, doesn't that introduce a pretty big storage overhead for
> large strings?


Memory is cheap ;-)

> I confess I had just assumed the unicode strings were stored
> in native unicode UTF8 format.

If you do that then indexing and slicing strings becomes very slow.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

Reply via email to