Re: [r6rs-discuss] Strings as codepoint-vectors: bad

Jason Orendorff Thu, 15 Mar 2007 22:20:36 -0800

On 3/15/07, Per Bothner <[EMAIL PROTECTED]> wrote:

Jason Orendorff wrote:
> Making strings vectors of 16-bit values is simple, familiar,
> speed-efficient, memory-efficient, easy to implement, and convenient
> for programmers.


[...]
Most code will as you say work fine even if string-ref
works on raw 8/16-bit code points.  But those code
points will not be "characters".  We'd have to remove
"character" functions.


I don't think we would have to remove them.  There
could be a way to extract the characters from a string:

 (string-iterator s)  procedure
   Returns an opaque iterator object that can be used with
   (next-char!).

 (next-char! it)  procedure
   Returns the next character from the iterator `it`, or #f if there
   are no more characters.

These can be fast, O(1), and so can the code-unit-oriented
(string-ref), (string-set!), and (string-length).

Perhaps we would like to hide the in-memory encoding of strings from
users, but that's not really possible if you *also* wish to expose a
fast low-level API with integer offsets.  The (string-ref) and
(string-set!)  APIs, as currently specified, hit a sweet spot of API
badness: they're so low-level and essential that it's almost
unthinkable that they be anything but O(1); yet they're sufficiently
high-level that every actual O(1) implementation sacrifices efficiency
somewhere else.

-j

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] Strings as codepoint-vectors: bad

Reply via email to