[r6rs-discuss] Re: [Formal] formal comment (ports, characters, strings, Unicode)

Jason Orendorff Tue, 20 Mar 2007 03:38:17 -0800

I don't intend to teach a seminar here on implementation
strategies for O(1) string-ref, but I'll describe just one
simple strategy that achieves O(1) time for both string-ref
and string-set! while using only a little more space than
UTF-8.  The basic idea is to represent every string by an
opaque, sealed record whose fields include a vector of
bytevectors.  All but the last of those bytevectors is the
UTF-8 encoding of exactly 100 characters; the last one
contains between 0 and 100 characters, inclusive, and
contains 0 characters iff the length of the entire string
is 0.


I'm going to assume you meant this seriously.

If no one cares how bad this is, the discussion is over.

I looked at a handful of Schemes and it certainly appears they all
keep each string in a single buffer.  I could be wrong.  I doubt it.
It's the same in the other languages whose internals I'm familiar
with.  I think there are many reasons for this--algorithms implemented
under the covers are simpler and work faster on a single buffer; you
can provide a simpler native interface; you do less copying every time
you want to call into a library; and maintenance costs.

If you want to provide (say) regular expressions using an existing
native library, it's crazy not to have your strings in a single buffer.
If your Scheme is written in Java or C#, it's impractical not to use
the system string type.

Again, if implementers don't care about any of this, we're done
here.

-j

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

[r6rs-discuss] Re: [Formal] formal comment (ports, characters, strings, Unicode)

Reply via email to