| From: Thomas Lord <[email protected]> | Date: Tue, 22 Sep 2009 20:38:09 -0700 | | On Tue, 2009-09-22 at 20:57 -0400, Aubrey Jaffer wrote: | > Unicode doesn't play well with a character datatype. Downcasing | > or foldcasing a single scalar-value can result in a length 2 | > string. | | That is not a problem with Unicode. That is a problem with | the assumption that there is a bijection between upcase | and downcase characters - an assumption violated by one | character in one language.
There are other ligatures which have this property. A Latin (English) example is (lowercase) "fi" (񏐡). Upcasing it gives "FI"; downcasing leaves it unchanged, foldcasing yields "fi". | > If anyone cares, other Unicode-supporting language development | > efforts seem to be moving away from the character datatype: | | > Accoring to <http://javascript.crockford.com/survey.html>, | > JavaScript lacks chars: | | > String is a sequence of zero or more Unicode characters. There | > is no separate character type. A character is represented as a | > string of length 1. | | A sequence of what now? What exactly is it represented as a | string of length 1? (string-ref "abc" 1) --> "b". _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
