William D Clinger wrote:

We are having this conversation because there are *lots*
of applications that need to index either (1) the Nth
scalar value of a string or (2) the Nth code unit of
some particular representation of the string.

Right - they need one of them, but not both.

Furthermore, most such application don't actually need N
to be a "counter" - which they need N for is as a magic
cookie - or position.

Hence my just-posted "marker" suggestion.

What you were trying to say, I think, is that you want to
add string-codeunit-ref.  Since there are three standard
forms of code units, you would need three procedures,
not just one:

    string-codeunit-utf-8-ref
    string-codeunit-utf-16-ref
    string-codeunit-utf-32-ref

Making all three of those run in O(1) time is much harder
than making string-ref run in O(1) time.

No, my point is for most applications you need *one* of these
and you *don't care* which it is.

For example, copying a string, appending strings, searching for
a substring: All of these work fine on opaque "code units", and
doesn't need to know whether the code unit is a utf-8 byte,
a utf-16 word, a utf-32 value, or a Unicode scalar value.
(The latter two are presumably the same.)

If the standard specifies string-codeunit-ref which returns an
opaque fixnum, and recommends (SHOULD) that function to be O(1),
but does not require that string-ref be O(1) then implementations
have a lot more latitude.

For example:
(string-codeunit-ref str k1) -> fixnum
(string-codeunit-char-at str k1) -> character
(string-codeunit-char-next str k1) -> k2
(string-codeunit-substring k1 k2) -> string
(string-codeunit-set! str k1)
(string-codeunit-replace! str1 k1 k2 str2)

The k1 and k2 values are exact non-negative integers.
--
        --Per Bothner
[EMAIL PROTECTED]   http://per.bothner.com/

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Reply via email to