Thomas Lord scripsit:
> That is not a problem with Unicode. That is a problem with
> the assumption that there is a bijection between upcase
> and downcase characters - an assumption violated by one
> character in one language.
A lot more than one. In addition to ess-zet, there are:
13 Latin and Armenian ligatures that uppercase to two characters
61 Latin and Greek lowercase letters with diacritics that
uppercase to the uppercase base character followed by the
combining diacritic(s)
I with dot, which lowercases (in non-Turkic contexts) to
i followed by combining dot in order to maintain canonical
equivalence rules (only one dot is displayed)
27 Greek titlecase combinations of an uppercase vowel with
diacritic(s) followed by a lowercase iota which uppercase to
the same vowel followed by an uppercase iota.
That makes 103 characters altogether that don't work in char-upcase
or char-downcase.
> A sequence of what now? What exactly is it represented as a
> string of length 1?
A Unicode codepoint. These languages have no representation of
codepoints, but they do have representations of sequences of codepoints.
This is not paradoxical.
--
John Cowan <[email protected]> http://www.ccil.org/~cowan
It's like if you meet an really old, really rich guy covered in liver
spots and breathing with an oxygen tank, and you say, "I want to be
rich, too, so I'm going to start walking with a cane and I'm going to
act crotchety and I'm going to get liver disease. --Wil Shipley
_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss