Brian Harvey scripsit:
> But what I was missing is that this isn't a technical question; it's a
> political one. I'm accustomed -- we're all accustomed -- to hearing
> people say "such-and-such can't be done; it'd be too complicated"
> and seeing that by being Schemely we can cut through the complexity
> and make The Right Thing be /easier/ than their wrong thing. But that
> only works when the problem is communicating with a computer, not when
> the problem is communicating with another person.
Exactly. Unicode is complicated and messy because the real world is
complicated and messy and full of people who use complicated and messy
writing systems, including anglophones.
> One thing still confuses me about the Turkish example. If their
> case-folding algorithm shuffles the vowels around, does that mean that
> you can't do lexicographic sorting in Turkish based on Unicode values?
You can't do lexicographic sorting in *any* language using encoding values.
The mere fact that in ASCII (and in Unicode) Z < a should clinch that.
Consequently, there is a separate but coordinated standard, ISO 14651,
which specifies for all the characters of Unicode:
a three-level scheme for sorting (sort first by letter identity,
then by diacritic marks, then by case, roughly)
a generic multilingual sort order
a standard method for tailoring the sort for languages which
the generic sort order doesn't handle (for example, in
Danish, z < æ < ø < å).
> Wouldn't they be better served by having their own entire chunk of the
> code space in which the characters would appear in Turkish lexicographic
> sort order?
No. They are better served by having their own ISO 14651 tailoring,
and they do. One of the ideas that Unicode newbies often come up with
(including me, when I was a newbie) is to have Turkish-specific i and I;
the trouble with that is that both when Unicode was founded and now,
there is far too much mixed Turkish and non-Turkish text that makes no
such distinction.
--
The first thing you learn in a lawin' family John Cowan
is that there ain't no definite answers [email protected]
to anything. --Calpurnia in To Kill A Mockingbird
_______________________________________________
r6rs-discuss mailing list
[email protected]
http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss