Thanks to all that has contributed, I am also following this thread with great interest. :)

Michel Fortin wrote:
> I mean, a grapheme is a slice of a string, can have multiple code points
> (like a string), can be appended the same way as a string, can be
> composed or decomposed using canonical normalization or compatibility
> normalization (like a string), and should be sorted, uppercased, and
> lowercased according to Unicode rules (like a string). Basically, a
> grapheme is just a string that happens to contain only one grapheme.

I would like to stress the fact that Unicode knows nothing about sorting, uppercasing, or lowercasing.

Those operations are tied to the alphabet (or writing system) that a certain grapheme happens to belong to at a given time. For example, we cannot uppercase the letter i without knowing what alphabet we are dealing with. Two possibilities: I and İ (I dot above).

It is the same issue with sorting.

Ali

Reply via email to