Eli Barzilay scripsit: > In any case, if you remember, I didn't join this thread from this > side. What always disturbed more was the arbitrary decision to treat > the case bit differently than many other similar bits. In the ASCII > world that Scheme was born to, this was a very minor wart. (I don't > know the details of punched cards, but I'd guess that Lisp was born in > a world that didn't have that bit.)
Quite so. The IBM 704 (released in 1954) used a 6-bit character code that implemented only upper-case characters. Lisp was born in 1958 on that hardware. IBM only came out with EBCDIC, which provided full support for upper and lower case, in 1963-64, at the same time that ASCII was standardized. (Ironically, IBM strongly supported ASCII, but wasn't able to cut over its entire production line of online and offline peripherals to support it before System/360 was released -- so System/360 and all its successors are EBCDIC-based to this day.) Fortran was also born on the 704, as was MUSIC, the program that generated Hal's singing voice in 2001: A Space Odyssey. > But these days ignoring something like unicode is no longer an option. > Given this, one solution is to keep the symmetry: the language is > still case insensitive, but it's done with unicode folding rules or > something similar -- so all similar bits have the same status. That > would be, IMO, the proper way of keeping case-insensitivite. But > there is a big problem here -- unicode has versions, and the rules are > likely to change, which means that code can break as a result. The > fundamental problem (again, IMO) here is that it's a redundant mixture > of cultural rules with a formal language. For all I know, it might be > decided tomorrow that "a" and "A" are no longer related, or that the > capital form of "a" is "A" or "$,1,p" or whatever. I obviously don't > think that this will ever happen -- but that is ultimately an issue of > human culture. Actually, it won't. Unicode (which is designed to last the centuries, adding new characters but not changing old ones) has very specific stability policies on what is guaranteed about future versions at http://www.unicode.org/policies/stability_policy.html . In particular: Case Folding Stability Applicable Version: Unicode 5.0+ Caseless matching of Unicode strings used for identifiers is stable. Case folding stability ensures that identifiers created in different versions of Unicode can be reliably matched in a case-insensitive manner. For more information on identifiers see UAX #31: Identifier and Pattern Syntax. Identifiers commonly exclude compatibility decomposable characters; therefore this policy formally applies only to strings normalized with NFKC. The toCaseFold() operation used for caseless matching is the full case folding defined by rule R4 under "Default Case Conversion" in Section 3.13, Default Case Algorithms of the Unicode Standard. The formal statement of this policy is: For each string S containing characters only from a given Unicode version, toCasefold(toNFKC(S)) under that version is identical to toCasefold(toNFKC(S)) under any later version of Unicode. Case Pair Stability Applicable Version: Unicode 5.0+ Two distinct assigned characters form a case pair when first character of the pair is the full uppercase of the second character, and the second character is the full lowercase of the first character. (Full upper-and lowercase are defined in Section 3.13 of the Unicode Standard.) If two characters form a case pair in a version of Unicode, they will remain a case pair in each subsequent version of Unicode. If two characters do not form a case pair in a version of Unicode, they will never become a case pair in any subsequent version of Unicode. More formally, for given versions V and U of Unicode, and any two distinct characters X and Y that are both assigned according to both V and U: toLowercaseV(X) = Y AND toUppercaseV(Y) = X if and only if toLowercaseU(X) = Y AND toUppercaseU(Y) = X Note that these conditions apply to two existing, distinct assigned characters. A character that is not part of a case pair could become part of one if the new case pair is formed at the time of the addition of a new character to Unicode. For example, a new capital version of U+028D LATIN SMALL LETTER TURNED W could be added in the future to form a new case pair. -- Possession is said to be nine points of the law, John Cowan but that's not saying how many points the law might have. [email protected] --Thomas A. Cowan (law professor and my father) _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
