Dan Kogai <[EMAIL PROTECTED]> writes: >http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/DEVANAGA.TXT >> ################## >> >> # Section 1: Map the following byte pairs as indicated: >> # (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER) >> # (Also see note about 0xF0 in comments above) >> >> 0xA1+0xE9 0x0950 # DEVANAGARI OM >> 0xA6+0xE9 0x090C # DEVANAGARI LETTER VOCALIC L >> 0xA7+0xE9 0x0961 # DEVANAGARI LETTER VOCALIC LL >> 0xAA+0xE9 0x0960 # DEVANAGARI LETTER VOCALIC RR >> 0xDB+0xE9 0x0962 # DEVANAGARI VOWEL SIGN VOCALIC L >> 0xDC+0xE9 0x0963 # DEVANAGARI VOWEL SIGN VOCALIC LL >> 0xDF+0xE9 0x0944 # DEVANAGARI VOWEL SIGN VOCALIC RR >> 0xE8+0xE8 0x094D+0x200C # DEVANAGARI SIGN VIRAMA + ZWNJ # >> explicit halan >> t >> 0xE8+0xE9 0x094D+0x200D # DEVANAGARI SIGN VIRAMA + ZWJ # soft >> halant >> 0xEA+0xE9 0x093D # DEVANAGARI SIGN AVAGRAHA >> >> # Section 2: Map the remaining bytes as follows: >> [snip] >> 0xA1 0x0901 # DEVANAGARI SIGN CANDRABINDU >> .... >> 0xA6 0x0907 # DEVANAGARI LETTER I >> 0xA7 0x0908 # DEVANAGARI LETTER II >> .... >> 0xAA 0x090B # DEVANAGARI LETTER VOCALIC R >> 0xA6 0x0907 # DEVANAGARI LETTER I >> ... >> 0xDB 0x093F # DEVANAGARI VOWEL SIGN I >> 0xDC 0x0940 # DEVANAGARI VOWEL SIGN II >> 0xDD 0x0941 # DEVANAGARI VOWEL SIGN U >> 0xDE 0x0942 # DEVANAGARI VOWEL SIGN UU >> 0xDF 0x0943 # DEVANAGARI VOWEL SIGN VOCALIC R >> .... >> 0xE8 0x094D # DEVANAGARI SIGN VIRAMA # halant >> .... >> 0xEA 0x0964 # DEVANAGARI DANDA >> # > > Let me tell you what we have to do when we receive 0xA1. We consult >Section:1 and if the following character does match that of Section 1, >use it. If not, treat the next character as just character. In other >words, 0xA1 have to be BOTH END POINT of the page traversal and THE >POINTER TO the next page. The current encengine is not desinged that >way.
Er, not 100% sure about that. I certainly considered that case at one point (encengine came from trie code I had used elsewhere for keyword matching - the original could match (say) 'lst' and 'ls' by having it backtrack if next thing was not a 't'). It is likely that enc2xs cannot build such a table though. > It must be EITHER. > One easy way to overcome this is that we make a mock doublebyte map >for 0xA1 and others, with the following page including all cases. Since >MacDevanagari is originally a single-byte encoding, this is still >possible without bloating the UCM. > >Dan the Encode Maintainer -- Nick Ing-Simmons http://www.ni-s.u-net.com/