On Thu, Nov 18, 2004 at 11:44:09AM -0500, Edward H. Trager wrote: > On Thursday 2004.11.18 01:44:07 +0000, Christopher Fynn wrote: > > Hmmm, I'll have to read that document again and think about this one. > One of the problems with Unicode is that it is, in many ways, such a mess. > Based on first principles, people wanted Unicode to use a "character" > model, not a "glyph" model. But it seems that what has really happened > is that we've basically ended up with a "glyph" model for all of those scripts > that already had legacy computer encodings at the time that Unicode came into > existance: > This includes Latin, Cyrillic, Greek, and Arabic among others. > Only scripts that had never (or barely) had the fortune --or misfortune, > depending on how > you look at it-- to be encoded for use on computers have ended up in Unicode > using a "character" rather than "glyph" based model. These would include > scripts like Thaana, Devanagari, and Burmese. For those scripts, there are > no "precomposed" forms -- and thus no difference between NFC versus NFD > "normalizations". > So, although it is more of a burden to display Burmese correctly, it might be > easier to collate Burmese than it is to collate some European language texts > where > the text could be in NFC, NFD, or even some combination thereof ...
Hmm, I see it differently. All the "fully composed" characters are indeed full characters in their own right, and Unicode is now adopting a policy of not having the full characters encoded anymore , you need to construct many latin letters out of a number of characters. So Unicode has left the principle of encoding characters - symbols with distinct meaning - and is now a kind of glyph registry. This makes sorting harder to do, although it it not unfeasible to sort eg latin letters in their full encoding together with decomposed approximations in a convenient way, as demonstrated by ISO 14651. Best regards Keld -- Linux-UTF8: i18n of Linux on all levels Archive: http://mail.nl.linux.org/linux-utf8/