On 01/18/2011 06:11 AM, Ali Çehreli wrote:
Thanks to all that has contributed, I am also following this thread with
great interest. :)

Michel Fortin wrote:
 > I mean, a grapheme is a slice of a string, can have multiple code points
 > (like a string), can be appended the same way as a string, can be
 > composed or decomposed using canonical normalization or compatibility
 > normalization (like a string), and should be sorted, uppercased, and
 > lowercased according to Unicode rules (like a string). Basically, a
 > grapheme is just a string that happens to contain only one grapheme.

I would like to stress the fact that Unicode knows nothing about
sorting, uppercasing, or lowercasing.

Those operations are tied to the alphabet (or writing system) that a
certain grapheme happens to belong to at a given time. For example, we
cannot uppercase the letter i without knowing what alphabet we are
dealing with. Two possibilities: I and İ (I dot above).

It is the same issue with sorting.

This is true and false ;-)

You are right, indeed, on the fact that issues like sorting one are language-specific, and more, use-case-specific. The case of the turkish beeing a good example. For another one, in french I do not even know whether there is an official rule! Anyway, whatever the answer, even eg famous newpapers, and official documents, used different rules. Most of them let down accents on uppercase, possibly because of computer limitation; there is a recent move (back) toward accented uppercase. This is very annoying: "Hélène" has 2 consistent and used uppercase versions. Conversely, how is software supposed to guess the lowercase version of "HELENE"?

Upon Unicode, it still defines norms for casing and so-called collation (compare, for sorting) algorithms. Dunno much more, i have never applied them, personly, for reasons like the ones above. The full list of it's technical docs can be found at http://unicode.org/reports/. See in particular http://unicode.org/reports/tr10/ for collation. (Unfortnately, case mapping is know part of the core standard doc, so that it's hard to get it.)

Denis
_________________
vita es estrany
spir.wikidot.com

Reply via email to