I am posting this as an individual member of the Scheme community. I am not speaking for the R6RS editors.
Thomas Lord wrote: > Earlier revisions of the standard defined a portable character set, > allowing implementations to freely expand beyond that set. > In a portable program, if only the portable character set is > used, reliably portable behavior obtains. What's different now is that Unicode has become an established standard, and the portability advantages of requiring Scheme programs to use Unicode (which is more than just a character set) appear far larger than any advantages that might still be derived from allowing implementations and programs to choose their own character sets. > In the R6 draft, the entire set of permitted characters is > explicitly enumerated. Actually, I believe the set of permitted characters is enumerated by reference to Unicode character categories. SFAIK, the set of characters in those categories is still growing, albeit slowly. > Moreover, the set's mapping to integer > values is both discontinuous and defined by three constants > that, a priori, appear to be arbitrary. The constants are part of the Unicode standard, and are more historical than arbitrary. With hindsight we all would have preferred a contiguous range, but I understand the historical circumstances that led to the hole in the middle. > My question is whether any principled reason for these arbitrary > constants is given that might be supported without appeal > to analogies to other programming languages. SFAIK, the justification for the constants has naught to do with other programming languages, but with Scheme and Unicode. Of all Unicode concepts, the one that comes closest to Scheme's historical notion of a character is the Unicode notion of a scalar value. Scheme could have defined its own encoding of scalar values, and that range could have been contiguous, but that would have been a Seriously Bad Idea. Using some Scheme-specific encoding would have created enormous confusion and made interfacing with other systems more difficult. > Note that there is a fine distinction to be made between arbitrary > choices such as the numeric values assigned to portable characters, > and arbitrary choices such as a mandatory domain restriction > on INTEGER->CHAR. In the former, if CHAR<->INTEGER > conversion is to be supported at all, it is clear that *some* arbitrary > choice must be made and so, of course, appeal to a popular standard > for that. In the latter case, the domain restriction, there is no obvious > reason to believe any such restriction is needed or makes the language > better than another language without that restriction. Even in the latter case, the report should state the domain for which integer->char can be relied upon to behave portably. Your question seems to come down to whether that procedure should be required to raise an exception when given values outside its portable domain: > So, how does it come to pass that those patently arbitrary aspects of > Unicode > appear in the report not as a set of domain limits within which > the behavior of portable programs is assured, but as restrictions that > forbid > an implementation from expanding the domains and ranges of certain > standard procedures? The argument, I believe, is that passing a non-portable value to integer->char is likely to be a common error, especially among programmers who are just now learning about Unicode or were introduced to Unicode in programming languages that were standardized back when Unicode was expected to use a 16-bit character set, and that allowing such non-portable arguments to integer->char would, if allowed by the report, also be a common error among implementors who are just now learning about Unicode or were introduced to Unicode in programming languages that were standardized back when Unicode was expected to use a 16-bit character set. Making it clear up front that desiring to pass non-portable values to integer->char is a grievous conceptual error will save everyone a lot of grief later. > There is a legal question at issue: how certain procedures should > be specified. But the larger question is on what basis, by what ways, > should such specifications be decided? > > If R6 is simply to be a record of votes taken, a kind of tallying up > of a political process with purely pragmatic aims, then perhaps > it is no longer a "report" at all. The line of thought that started > with the "ultimate" papers has ended. What carries on, in its place, > is a particular *use* of the main tangible artifact of that line of > thought. And, in that case, the introduction should certainly be > purged or retitled "Obituary" and the document as a whole > retitled. I have some sympathy for that point of view. I have less sympathy for that point of view with respect to Unicode than with several other parts of the report, however, because I think the draft report's treatment of Unicode is one of the more compelling arguments to be made in its favor. Will _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
