On Sun, Aug 19, 2012 at 12:11 PM, Paul Rubin <no.email@nospam.invalid> wrote:
> Chris Angelico <ros...@gmail.com> writes:
>> UTF-8 is highly inefficient for indexing. Given a buffer of (say) a
>> few thousand bytes, how do you locate the 273rd character?
>
> How often do you need to do that, as opposed to traversing the string by
> iteration?  Anyway, you could use a rope-like implementation, or an
> index structure over the string.

Well, imagine if Python strings were stored in UTF-8. How would you slice it?

>>> "asdfqwer"[4:]
'qwer'

That's a not uncommon operation when parsing strings or manipulating
data. You'd need to completely rework your algorithms to maintain a
position somewhere.

ChrisA
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to