[EMAIL PROTECTED] scripsit: > I'm also concerned that R6RS, as currently written, seems to require > UCS-4/UTF-32 strings. The problem is that string-ref returns characters, > and characters can't be surrogates.
If string-ref also required O(1) time complexity, then you'd be right. But it doesn't; it's perfectly fine to implement string-ref on top of underlying UTF-8 or UTF-16 character sequences; you just have to settle for O(N) performance. Alternatively, you can use a design in which strings that use the Latin-1 repertoire are stored as Latin-1, strings that use the BMP repertoire are stored as UCS-2, and all others as UCS-4. That allows string-ref to be O(1) always, but string-set! winds up being O(N) in the general case, though still O(1) in most practical situations. > Then we'd have uchar and ustring and, perhaps, fewer > backward-compatibility issues. Python has been suffering through that for several years now, and has decided to break backward compatibility and abandon the 8-bit strings -- but using the 8-bit names for Unicode strings. I don't know what the internal implementation is. > But there's no bytevector-upper or bytevector-<? and such, so no, > something was lost, at least for "low level" work. They're easy to write, though, if you do need them. If you want them to be locale-sensitive, you have to work a little harder. -- John Cowan http://ccil.org/~cowan [EMAIL PROTECTED] 'Tis the Linux rebellion / Let coders take their place, The Linux-nationale / Shall Microsoft outpace, We can write better programs / Our CPUs won't stall, So raise the penguin banner of / The Linux-nationale. --Greg Baker _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
