| From: Thomas Lord <[email protected]>
 | Date: Tue, 22 Sep 2009 20:38:09 -0700
 | 
 | On Tue, 2009-09-22 at 20:57 -0400, Aubrey Jaffer wrote:
 | > Unicode doesn't play well with a character datatype.  Downcasing
 | > or foldcasing a single scalar-value can result in a length 2
 | > string.
 | 
 | That is not a problem with Unicode.  That is a problem with 
 | the assumption that there is a bijection between upcase
 | and downcase characters - an assumption violated by one
 | character in one language.  

There are other ligatures which have this property.  A Latin (English)
example is (lowercase) "fi" (&#324641;).  Upcasing it gives "FI";
downcasing leaves it unchanged, foldcasing yields "fi".

 | > If anyone cares, other Unicode-supporting language development
 | > efforts seem to be moving away from the character datatype:
 | 
 | >  Accoring to <http://javascript.crockford.com/survey.html>,
 | >  JavaScript lacks chars:
 | 
 | >  String is a sequence of zero or more Unicode characters. There
 | >  is no separate character type.  A character is represented as a
 | >  string of length 1.
 | 
 | A sequence of what now?   What exactly is it represented as a 
 | string of length 1?

(string-ref "abc" 1)  --> "b".

_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss

Reply via email to