I think we've gotten way off course.  The only reason to standardize
the internal representation of strings would be to expose code units.
Otherwise you wouldn't bother.  I can think of two good reasons to
expose code units and one pragmatic reason:

 1. Performance.  I think R6RS should support a portable regex
    library--one that people can actually use.  A portable parser
    library would also be nice.  These things need fast access to
    code units.

 2. Native call interface.  A portable one is beyond the scope of
    R6RS, but a standard representation for strings now would
    simplify future efforts.

 3. (the pragmatic reason) Maybe the editors don't have time to add a
    thorough high-level string API to R6RS.  I don't know if this is
    true or not.  If so, a simple, conventional low-level API would
    be an improvement over the current draft.

If these reasons are unpersuasive, we need not carry on about UTF-8
vs. UTF-16 etc. etc.  If the editors decide that R6RS will not expose
code units, I'll just second Per Bothner's suggestion:

* More generally, write the specification with the assumption
that many/most Scheme implementations will use a simple
UTF-8 array or a UTF-16 array.  In the case of mutable
strings, the array may be grown/relocated, and optionally
use a buffer-gap scheme.  We should not assume or require
anything more complicated.

On the other hand, if it seems desirable to expose code units, UTF-16
is a good balance of all the factors.

-j

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Reply via email to