Dan Kogai <[EMAIL PROTECTED]> writes:
>http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/DEVANAGA.TXT
>> ##################
>>
>> # Section 1: Map the following byte pairs as indicated:
>> # (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER)
>> # (Also see note about 0xF0 in comments above)
>>
>> 0xA1+0xE9       0x0950  # DEVANAGARI OM
>> 0xA6+0xE9       0x090C  # DEVANAGARI LETTER VOCALIC L
>> 0xA7+0xE9       0x0961  # DEVANAGARI LETTER VOCALIC LL
>> 0xAA+0xE9       0x0960  # DEVANAGARI LETTER VOCALIC RR
>> 0xDB+0xE9       0x0962  # DEVANAGARI VOWEL SIGN VOCALIC L
>> 0xDC+0xE9       0x0963  # DEVANAGARI VOWEL SIGN VOCALIC LL
>> 0xDF+0xE9       0x0944  # DEVANAGARI VOWEL SIGN VOCALIC RR
>> 0xE8+0xE8       0x094D+0x200C   # DEVANAGARI SIGN VIRAMA + ZWNJ # 
>> explicit halan
>> t
>> 0xE8+0xE9       0x094D+0x200D   # DEVANAGARI SIGN VIRAMA + ZWJ  # soft 
>> halant
>> 0xEA+0xE9       0x093D  # DEVANAGARI SIGN AVAGRAHA
>>
>> # Section 2: Map the remaining bytes as follows:
>> [snip]
>> 0xA1    0x0901  # DEVANAGARI SIGN CANDRABINDU
>> ....
>> 0xA6    0x0907  # DEVANAGARI LETTER I
>> 0xA7    0x0908  # DEVANAGARI LETTER II
>> ....
>> 0xAA    0x090B  # DEVANAGARI LETTER VOCALIC R
>> 0xA6    0x0907  # DEVANAGARI LETTER I
>> ...
>> 0xDB    0x093F  # DEVANAGARI VOWEL SIGN I
>> 0xDC    0x0940  # DEVANAGARI VOWEL SIGN II
>> 0xDD    0x0941  # DEVANAGARI VOWEL SIGN U
>> 0xDE    0x0942  # DEVANAGARI VOWEL SIGN UU
>> 0xDF    0x0943  # DEVANAGARI VOWEL SIGN VOCALIC R
>> ....
>> 0xE8    0x094D  # DEVANAGARI SIGN VIRAMA        # halant
>> ....
>> 0xEA    0x0964  # DEVANAGARI DANDA
>> #
>
>   Let me tell you what we have to do when we receive 0xA1.  We consult 
>Section:1 and if the following character does match that of Section 1, 
>use it.  If not, treat the next character as just character.  In other 
>words, 0xA1 have to be BOTH END POINT of the page traversal and THE 
>POINTER TO the next page.  The current encengine is not desinged that 
>way.  

Er, not 100% sure about that. I certainly considered that case at one 
point (encengine came from trie code I had used elsewhere for keyword
matching - the original could match (say) 'lst' and 'ls' by having 
it backtrack if next thing was not a 't').

It is likely that enc2xs cannot build such a table though.

> It must be EITHER.
>   One easy way to overcome this is that we make a mock doublebyte map 
>for 0xA1 and others, with the following page including all cases.  Since 
>MacDevanagari is originally a single-byte encoding, this is still 
>possible without bloating the UCM.
>
>Dan the Encode Maintainer
-- 
Nick Ing-Simmons
http://www.ni-s.u-net.com/

Reply via email to