Re: [r6rs-discuss] perhaps i should be formal, but....

John Cowan Wed, 14 Mar 2007 20:10:27 -0800

[EMAIL PROTECTED] scripsit:

> I'm also concerned that R6RS, as currently written, seems to require 
> UCS-4/UTF-32 strings. The problem is that string-ref returns characters, 
> and characters can't be surrogates.


If string-ref also required O(1) time complexity, then you'd be right.
But it doesn't; it's perfectly fine to implement string-ref on top of
underlying UTF-8 or UTF-16 character sequences; you just have to settle
for O(N) performance.

Alternatively, you can use a design in which strings that use the Latin-1
repertoire are stored as Latin-1, strings that use the BMP repertoire
are stored as UCS-2, and all others as UCS-4.  That allows string-ref to
be O(1) always, but string-set! winds up being O(N) in the general case,
though still O(1) in most practical situations.

> Then we'd have uchar and ustring and, perhaps, fewer 
> backward-compatibility issues. 

Python has been suffering through that for several years now, and has
decided to break backward compatibility and abandon the 8-bit strings --
but using the 8-bit names for Unicode strings.  I don't know what the
internal implementation is.

> But there's no bytevector-upper or bytevector-<? and such, so no,
> something was lost, at least for "low level" work.

They're easy to write, though, if you do need them.  If you want them
to be locale-sensitive, you have to work a little harder.

-- 
John Cowan    http://ccil.org/~cowan  [EMAIL PROTECTED]
'Tis the Linux rebellion / Let coders take their place,
The Linux-nationale / Shall Microsoft outpace,
We can write better programs / Our CPUs won't stall,
So raise the penguin banner of / The Linux-nationale.  --Greg Baker

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Re: [r6rs-discuss] perhaps i should be formal, but....

Reply via email to