From: William D Clinger <[EMAIL PROTECTED]> Subject: Re: [r6rs-discuss] perhaps i should be formal, but.... Date: Wed, 14 Mar 2007 16:16:11 -0400
> I am posting this as an individual member of the Scheme > community. I am not speaking for the R6RS editors. > > Thomas Lord wrote: > > Earlier revisions of the standard defined a portable character set, > > allowing implementations to freely expand beyond that set. > > In a portable program, if only the portable character set is > > used, reliably portable behavior obtains. > > What's different now is that Unicode has become an > established standard, and the portability advantages > of requiring Scheme programs to use Unicode (which > is more than just a character set) appear far larger > than any advantages that might still be derived from > allowing implementations and programs to choose their > own character sets. I also want to reserve a possibility of using different character sets / encodings, but I agree that Unicode is the only practical standard for portable programs. I'm happy as far as R6RS does not prohibit an implementation to use alternative character set / encodings if it wish. For example, Japanese official family registration system uses its own character set and codepoints, since it needs to distinguish more subtle differences of characters than Unicode. (There's no clear line between abstract characters and glyphs---it is context-dependent, and for family names the line gets closer to glyphs). The range restriction of integer->char seems reasonable to guarantee the portable behavior. The implementation can have another procedure that can deal with non-unicode range/character set. Although I feel it better that the standard uses clearer namings between integer and character conversion, such as unicode-scalar-value->char (or some abbreviation of it), which makes it clear that one can't pass non-unicode scalar value. This isn't a strong desire, though. The wording of "character" object definition, however, could be changed. It is unclear to me that (char? <obj>) can return #t if <obj> is in the implementation's extended character set but not in unicode. If it can't, I can still provide (extended-char? <obj>) for example, but it's a bit awkward. So as other procedures that deals with characters---in (string <char> ...), should each <char> be in unicode? (I hope not!) --shiro _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
