>> yet another is how one should deal with character-based indexing, for 
>> instance
>> indexing in sam expressions - does /é/-#0+#1 point to the character after
>> the unadorned e, or after the whole sequence?

>thair be dragons here.  the library of congress has a 100-page manual on 
>alphebetization
>of languages with roman letters.  different languages have different rules 
>(sometimes for the 
>same codepoint); a language sometimes has different rules for different 
>codepoints.
>then there are ligatures.  in german ss and ß are sorted the same. 

uff.  this answer doesn't fit the question.  i think base+combiner* should be 
treated as 
an indivisible character.  but again, if we use cannonical compositions, this 
case can be
avoided except in cases where the character can't be drawn anyway.

- erik

Reply via email to