On 2021-05-26, Alan Gauld <alan.ga...@yahoo.co.uk> wrote: > On 25/05/2021 23:23, Terry Reedy wrote: >> In CPython's Flexible String Representation all characters in a string >> are stored with the same number of bytes, depending on the largest >> codepoint. > > I'm learning lots of new things in this thread! > > Does that mean that if I give Python a UTF8 string that is mostly single > byte characters but contains one 4-byte character that Python will store > the string as all 4-byte characters? > > If so, doesn't that introduce a pretty big storage overhead for > large strings?
Memory is cheap ;-) > I confess I had just assumed the unicode strings were stored > in native unicode UTF8 format. If you do that then indexing and slicing strings becomes very slow. -- https://mail.python.org/mailman/listinfo/python-list