As the R6RS process's chief Unicode hound, I'd like to say a word or two about why I think Unicode matters. There are at least three kinds of reasons.
1) If a process must deal with text, it should be designed from the ground up to deal with text in a universal encoding, converting to local encodings only when required to interface with surrounding systems.&It's been estimated that building in Unicode adds perhaps 20% to development cost, whereas retrofitting it adds about 100%. That's an "industrial" motive to support Unicode, and although the (rnrs unicode (6)) library doesn't come close to providing all that's needed for practical work, it does provide a useful core. 2) Scheme requires that there exist in the application domain strings which are constructed as sequences of characters. (I think that's a mistake: I'd rather have strings as primitives and understand characters to be a finite subset of short strings.) Having the significance and interpretation of characters differ from one implementation to the next is a needless kind of variation: in practice it means that portable programs must be confined to ASCII data. Breaking the historical link between characters and octets is something that should be done in the core whether or not anything else about Unicode is supported. 3) But most deeply, I believe, is the fact that Scheme programmers are themselves dealing with text when they write their programs, and if the repertoire of characters allowed in a program is non-universal, the result is an unfair disadvantaging of people who use another repertoire natively. I've been told that one of the main reasons that Java caught on so quickly in Japan is that it was the first mainstream language that required implementations to support meaningful Japanese identifiers written in the native script. Java's reserved identifiers of course had to remain in Latin script, but there are only a few of them compared to the vast number of identifiers in a Java program. More serious was the fact that the names of existing standard and non-standard Java libraries were and are typically Latin. In Scheme, of course, there are no reserved words, and macros permit arbitrary renamings so that a whole program could be valid Scheme even though not a single Latin-script identifier appeared outside the mapping macros. Allowing programmers to write Scheme using meaningful identifiers from their native language, written in the usual way, is to me a matter of elemental fairness, and ought to be allowed in all cases. -- John Cowan [email protected] http://ccil.org/~cowan Half the lies they tell about me are true. --Tallulah Bankhead, American actress _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
