I am posting this as an individual member of the Scheme community. I am not speaking for the R6RS editors, and this message should not be confused with the editors' eventual formal response.
Thomas Lord wrote: > It would be equally multi-lingual to lock down CHAR as UTF-8 > code units, UTF-16 code units, scalar values, grapheme clusters. That sentence appears to conflate three different things. I wouldn't bother to point this out, except that ignoring those differences creates confusion that runs like an eternal golden braid (sorry, Doug!) through these discussions. Grapheme clusters are particular (not arbitrary) sequences of characters [1]. (Yes, characters---according to the Unicode glossary [1].) Unicode scalar values are code points excluding surrogates (and correspond pretty closely to the third meaning of character given by the Unicode glossary [1]). Code units (whether UTF-8, UTF-16, UTF-32, or whatever) are bit patterns that are used to encode Unicode scalar values. As programmers and as language designers, one of our guiding principles is that bit patterns don't matter except where they are forced upon us by the external world, typically via i/o. > It would be equally multi-lingual to permit but not require any of > those interpretations and to permit but not require extensions. To permit any of those three incompatible interpretations would be a disaster for portability. We have to pick one interpretation, and stick with it. > It > would be better multi-lingual to permit extensions in areas that Unicode > has specifically declined to pursue. While I am much more sympathetic to extensions than you would conclude by reading a draft R6RS, I am not very sympathetic to fundamentally incoherent language design, which is how I would describe any design that permitted implementors to decide for themselves whether Scheme's characters should correspond to grapheme clusters, scalar values, or code units. With regard to Bucky bits, my conversations with users and designers of Common Lisp have given the impression that most consider the original inclusion of bucky bits in Common Lisp to have been a mistake; X3J13 relegated them to implementation-dependent attributes, which are more likely to get in the way of writing portable code than to serve any portable purpose [2,3]. Finally, I'd like to note that the current draft R6RS does not actually preclude bucky bits. An implementation could add bucky bits to characters while making those bits invisible to all of the standard operations on characters and strings. An implementation-dependent library could make those bits visible. (That would mess up some programmers' mental model of eqv?, but their model of eqv? is pretty messed up anyway. The eq? and eqv? procedures have no special status apart from constraints that are laid out by the report(s).) Will [1] http://unicode.org/glossary/ [2] http://www.supelec.fr/docs/cltl/clm/node25.html#SECTION00624000000000000000 [3] http://www.lisp.org/HyperSpec/Issues/iss026-writeup.html _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
