Re: Higher level built-in strings

bearophile Mon, 19 Jul 2010 13:45:16 -0700

Walter Bright:
> 1. most string operations, such as copying and searching, even regular 
> expressions, work just fine using regular indices.
> 
> 2. doing the operations in (1) using code points and having to continually 
> decode the strings would result in disastrously slow code.


In my original post I have forgotten another difference over arrays:
5b) a method like ".unit()" that allows to index code units.
So "foo".unit(1) is always O(1). Lower level code can use this method as [] is 
used for arrays.

Copying is done on the bytes themselves, with a memcpy, no decoding necessary. 
If the point (9) (automatic LZO encoding) is used, then copying can be 2-3 
times faster for long strings (because there is less data and you don't need to 
uncompress it to copy). (if such compression is added, then strings can need a 
third accessor method, to the true bytes).


> 3. the user can always layer a code point interface over the strings, but 
> going
> the other way is not so practical.

This is true. But it makes the string usage unnecessarily low-level and hard...
A better design in a smart system language as D is to give strings a default 
high level "interface" that sees strings as what they are at high level, and 
add a second lower level interface when you need faster lower-level fiddling 
(so they have [] that returns code points and unit() that returns code units).

Bye,
bearophile

Re: Higher level built-in strings

Reply via email to