On Fri May 19 09:38:23 CDT 2006, [EMAIL PROTECTED] wrote: > perhaps there are actually two problems here: > 1) how to get libdraw to map back from a sequence of combining characters > to a character in the font that represents that sequence.
this is pretty easy. the unicode standards provides cannonical compositions. i think it would be easier for libdraw to insist that string be given strings that have been cannonicaly composed. perhaps a job for tcs. > 2) how to draw sequences of combining characters that don't exist in > precombined > form within unicode. it's quite possible that one might wish to provide > pre-rendered glyphs for some of these sequences - the current font format > can't deal with that. the general case doesn't seem like it would yield a solution with a bitmap font. sure you could put a circumflex on an "a". but what about dashed letters like ł? drawing a dash through an arbitrary character gets to be a real pain. the good news is that solving #1 would take care of most problems. unfortunately, some romanized versions of russian and vietnamise (i believe) would still not work. but we would get 80% of what we would like without the pain of trying to treat a bitmap as if they were vector character descriptions a la metafont. > > another issue is dealing with code (e.g. libframe) that assumes that > characters do not overstrike - i.e. that there's a 1-1 correspondence > between Runes and glyphs. charofpt would be a problem. there would be some problems with picking a proper endpoint for highlighting. a break between the base and the combiners would be a problem. i think the largest problem here would be dealing with the character height. currently in libdraw a character's height is the font's height. this isn't true for many fonts we already have -- ÄÖÜ☺ tend to get clipped with pelm because they are taller than the font file claims. just expanding the height of the font would look pretty funny in the absence of taller characters. > yet another is how one should deal with character-based indexing, for instance > indexing in sam expressions - does /é/-#0+#1 point to the character after > the unadorned e, or after the whole sequence? thair be dragons here. the library of congress has a 100-page manual on alphebetization of languages with roman letters. different languages have different rules (sometimes for the same codepoint); a language sometimes has different rules for different codepoints. then there are ligatures. in german ss and ß are sorted the same. there are probablly only two sensible ways to deal with this. either strip/do not strip all combiners and do a naive sort or define some sort of locale. > it'd be nice to sort this issue out properly; surely it shouldn't be > too hard? i believe this is another entry for the "famous lies list," ranking somewhat below "check's in the mail" and above "i have this friend who...." - erik
