On 3/15/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> On 3/15/07, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> > > If string-ref also required O(1) time complexity, then you'd be
right.
> > > But it doesn't; it's perfectly fine to implement string-ref on top
of
> > > underlying UTF-8 or UTF-16 character sequences; you just have to
settle
> > > for O(N) performance.
> >
> > Are you suggesting that indexes represent code points rather than code
> > units? I haven't seen anyone do that, not as the one-and-only
interface to
> > elements of a string. Have you? And do you think UTF-8/UTF-16
> > implementations should be *required* to do that? (Obviously, then,
> > string-length would have to return the number of code points rather
than
> > the number of code units.)
>
> SBCL does that.
>
http://sbcl.sourceforge.net/sbcl-internals/Character-and-String-Types.html

I think SBCL uses UCS-4-sized code units when Unicode is enabled. If
that's correct, then no, it doesn't do "that", it simply chooses an
encoding that avoids the problem (at the expense of space).



This presentation about the unicode support in SBCL also sais code point.
http://www.doc.gold.ac.uk/~mas01cr/talks/2005-04-24%20Amsterdam/presentation.pdf
The internal representation is an immediate bit, the character tag and
the code point.


Alexander

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Reply via email to