I don't intend to teach a seminar here on implementation strategies for O(1) string-ref, but I'll describe just one simple strategy that achieves O(1) time for both string-ref and string-set! while using only a little more space than UTF-8. The basic idea is to represent every string by an opaque, sealed record whose fields include a vector of bytevectors. All but the last of those bytevectors is the UTF-8 encoding of exactly 100 characters; the last one contains between 0 and 100 characters, inclusive, and contains 0 characters iff the length of the entire string is 0.
I'm going to assume you meant this seriously. If no one cares how bad this is, the discussion is over. I looked at a handful of Schemes and it certainly appears they all keep each string in a single buffer. I could be wrong. I doubt it. It's the same in the other languages whose internals I'm familiar with. I think there are many reasons for this--algorithms implemented under the covers are simpler and work faster on a single buffer; you can provide a simpler native interface; you do less copying every time you want to call into a library; and maintenance costs. If you want to provide (say) regular expressions using an existing native library, it's crazy not to have your strings in a single buffer. If your Scheme is written in Java or C#, it's impractical not to use the system string type. Again, if implementers don't care about any of this, we're done here. -j _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
