Walter Bright: > 1. most string operations, such as copying and searching, even regular > expressions, work just fine using regular indices. > > 2. doing the operations in (1) using code points and having to continually > decode the strings would result in disastrously slow code.
In my original post I have forgotten another difference over arrays: 5b) a method like ".unit()" that allows to index code units. So "foo".unit(1) is always O(1). Lower level code can use this method as [] is used for arrays. Copying is done on the bytes themselves, with a memcpy, no decoding necessary. If the point (9) (automatic LZO encoding) is used, then copying can be 2-3 times faster for long strings (because there is less data and you don't need to uncompress it to copy). (if such compression is added, then strings can need a third accessor method, to the true bytes). > 3. the user can always layer a code point interface over the strings, but > going > the other way is not so practical. This is true. But it makes the string usage unnecessarily low-level and hard... A better design in a smart system language as D is to give strings a default high level "interface" that sees strings as what they are at high level, and add a second lower level interface when you need faster lower-level fiddling (so they have [] that returns code points and unit() that returns code units). Bye, bearophile