On Sun, Aug 19, 2012 at 12:11 PM, Paul Rubin <no.email@nospam.invalid> wrote: > Chris Angelico <ros...@gmail.com> writes: >> UTF-8 is highly inefficient for indexing. Given a buffer of (say) a >> few thousand bytes, how do you locate the 273rd character? > > How often do you need to do that, as opposed to traversing the string by > iteration? Anyway, you could use a rope-like implementation, or an > index structure over the string.
Well, imagine if Python strings were stored in UTF-8. How would you slice it? >>> "asdfqwer"[4:] 'qwer' That's a not uncommon operation when parsing strings or manipulating data. You'd need to completely rework your algorithms to maintain a position somewhere. ChrisA -- http://mail.python.org/mailman/listinfo/python-list