Mark Lawrence wrote: > On 04/06/2014 01:39, Chris Angelico wrote: >> A current discussion regarding Python's Unicode support centres (or >> centers, depending on how close you are to the cent[er]{2} of the >> universe) around one critical question: Is string indexing common? >> >> Python strings can be indexed with integers to produce characters >> (strings of length 1). They can also be iterated over from beginning >> to end. Lots of operations can be built on either one of those two >> primitives; the question is, how much can NOT be implemented >> efficiently over iteration, and MUST use indexing? Theories are great, >> but solid use-cases are better - ideally, examples from actual >> production code (actual code optional). >> >> I know the collective experience of python-list can't fail to bring up >> a few solid examples here :) >> >> Thanks in advance, all!! >> >> ChrisA >> > > Single characters quite often, iteration rarely if ever, slicing all the > time, but does that last one count?
The indices used for slicing typically don't come out of nowhere. A simple example would be def strip_prefix(text, prefix): if text.startswith(prefix): text = text[len(prefix):] return text If both prefix and text use UTF-8 internally the byte offset is already known. The question is then how we can preserve that information. The first approach that comes to mind is an int subtype: >>> for i, c in enumerate("123αλφα"): ... print(i, byteoffset(i), c) ... 0 0 1 1 1 2 2 2 3 3 3 α 4 5 λ 5 7 φ 6 9 α This would work in the strip_prefix() example, but lead to data corruption in most other cases unless limited to a specific string -- in which case it would no longer work with strip_prefix(). So a new interface would be needed. My second try, an object with two byte offsets linked to a specific string: >>> span("foobar").startswith("oob") >>> p = span("foobar").startswith("foo") >>> p.replace("baz") 'bazbar' >>> p.before() '' >>> p.after() 'bar' >>> span("foo bar baz").find("bar").replace("spam") 'foo spam bar' I have no idea if that could work out... -- https://mail.python.org/mailman/listinfo/python-list