Chris Angelico <ros...@gmail.com> writes: > And of course, taking the *entire* rest of the string isn't the only > thing you do. What if you want to take the next six characters after > that index? That would be constant time with a fixed-width storage > format.
How often is this an issue in practice? I wonder how other languages deal with this. The examples I can think of are poor role models: 1. C/C++ - unicode impaired, other than a wchar type 2. Java - bogus UCS-2-like(?) representation for historical reasons Also has some modified UTF=8 for reasons that made no sense and that I don't remember 3. Haskell - basic string type is a linked list of code points. "hello" is five list nodes. New Data.Text library (much more efficient) uses something like ropes, I think, with UTF-16 underneath. 4. Erlang - I think like Haskell. Efficiently handles byte blocks. 5. Perl 6 -- ??? 6. Ruby - ??? (but probably quite slow like the rest of Ruby) 7. Objective C -- ??? 8, 9 ... (any other important ones?) -- http://mail.python.org/mailman/listinfo/python-list