On Mon 06 Jun 2011, Arthur Reutenauer wrote: > > Well, there *is* more than one way to represent รค in UTF-8 > > If you mean "non-shortest" forms such as 0xE0 0x83 0xA4 or 0xF0 0x80 > 0x83 0xA4, then no, they have been forbidden since Unicode 3 in 2000 > (formally Corrigendum #1, see > http://www.unicode.org/versions/corrigendum1.html).
I was actually thinking of precomposed vs. combining diacritics. I was blissfully unaware of the non-shortest-form problem up until now... Pont ___________________________________________________________________________________ If your question is of interest to others as well, please add an entry to the Wiki! maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context webpage : http://www.pragma-ade.nl / http://tex.aanhet.net archive : http://foundry.supelec.fr/projects/contextrev/ wiki : http://contextgarden.net ___________________________________________________________________________________