> However, in the class of languages for which I am trying to > provide support, certain characters are meant to be produced > by an ordered combination of other characters. For example, > the general sequence in Devanagari script (and this extends > to the other scripts as well) is that > consonant+virama+consonant produces > half-consonant+consonant, where the half-consonant has no > other unicode specification. As a concrete case in > Devanagari, na virama sa (viz., \u0928\u094d\u0938) should > produce the nsa character (this sequence can be seen in any > unicode representation of the word "Sanskrit" in Devanagari > script). > > It seems to me that TTF font specifications (i.e., those I > converted to subfonts using Federico's ttf2subf) include > these sequence definitions, which are then processed by each > application providing support for the fonts. Plan 9 > subfonts are much too simple for this.
yes. this is a problem. unfortunately the unicode guys took the position that codepoint is divorced from glyphs unfortunately, this case isn't as bad as it gets. e.g. archaic cryllic letters have transliterations like ^^A in unicode. would three hats on an A be illegal? i don't see what would prevent it. and therefore one needs to implment some sort of character layout engine to render unicode. that's pretty bogus. what is the total number of stealth characters like nsa? if it'not too unreasonable, it might be good enough to steal part of the operating system or application reserved areas. i hope my ignorance of the particular script in question isn't leading to silly suggestions! - erik