[Encode] Farsi is Okay. The problem is in Indics!

Dan Kogai Fri, 05 Apr 2002 07:07:56 -0800

On Friday, April 5, 2002, at 11:18 , Jarkko Hietaniemi wrote:
> Since it seems that we won't make it for Monday the 8th (MakeMaker is
> still unfinished, and UTF-8 keys are still a bit dodgy, and so on), I
> guess small updates on Encode (docs certainly, and obvious bugs) are
> still okay-- and even the Farsi encodings, but please first ask
> Roozbeh Pournader ( [EMAIL PROTECTED] ) the guy that seems to be
> behind much of the Farsi computing stuff, whether (a) we should/could
> include the Farsi mappings (b) which mappings (c) there are additional
> complications we are not aware of (e.g. is it really just a simple
> table mapping, or is something algorithmic needed).


I think you are mistaken Farsi for Indics.  Farsi is extended Arabic 
(script, that is) and it is indeed supported in MacFarsi already.  BIDI 
is tough but Encode does not (have to) care.

Here I am talking about Devanagari and its variants.  See this.

http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/DEVANAGA.TXT
> ##################
>
> # Section 1: Map the following byte pairs as indicated:
> # (ZWNJ means ZERO WIDTH NON-JOINER, ZWJ means ZERO WIDTH JOINER)
> # (Also see note about 0xF0 in comments above)
>
> 0xA1+0xE9       0x0950  # DEVANAGARI OM
> 0xA6+0xE9       0x090C  # DEVANAGARI LETTER VOCALIC L
> 0xA7+0xE9       0x0961  # DEVANAGARI LETTER VOCALIC LL
> 0xAA+0xE9       0x0960  # DEVANAGARI LETTER VOCALIC RR
> 0xDB+0xE9       0x0962  # DEVANAGARI VOWEL SIGN VOCALIC L
> 0xDC+0xE9       0x0963  # DEVANAGARI VOWEL SIGN VOCALIC LL
> 0xDF+0xE9       0x0944  # DEVANAGARI VOWEL SIGN VOCALIC RR
> 0xE8+0xE8       0x094D+0x200C   # DEVANAGARI SIGN VIRAMA + ZWNJ # 
> explicit halan
> t
> 0xE8+0xE9       0x094D+0x200D   # DEVANAGARI SIGN VIRAMA + ZWJ  # soft 
> halant
> 0xEA+0xE9       0x093D  # DEVANAGARI SIGN AVAGRAHA
>
> # Section 2: Map the remaining bytes as follows:
> [snip]
> 0xA1    0x0901  # DEVANAGARI SIGN CANDRABINDU
> ....
> 0xA6    0x0907  # DEVANAGARI LETTER I
> 0xA7    0x0908  # DEVANAGARI LETTER II
> ....
> 0xAA    0x090B  # DEVANAGARI LETTER VOCALIC R
> 0xA6    0x0907  # DEVANAGARI LETTER I
> ...
> 0xDB    0x093F  # DEVANAGARI VOWEL SIGN I
> 0xDC    0x0940  # DEVANAGARI VOWEL SIGN II
> 0xDD    0x0941  # DEVANAGARI VOWEL SIGN U
> 0xDE    0x0942  # DEVANAGARI VOWEL SIGN UU
> 0xDF    0x0943  # DEVANAGARI VOWEL SIGN VOCALIC R
> ....
> 0xE8    0x094D  # DEVANAGARI SIGN VIRAMA        # halant
> ....
> 0xEA    0x0964  # DEVANAGARI DANDA
> #

   Let me tell you what we have to do when we receive 0xA1.  We consult 
Section:1 and if the following character does match that of Section 1, 
use it.  If not, treat the next character as just character.  In other 
words, 0xA1 have to be BOTH END POINT of the page traversal and THE 
POINTER TO the next page.  The current encengine is not desinged that 
way.  It must be EITHER.
   One easy way to overcome this is that we make a mock doublebyte map 
for 0xA1 and others, with the following page including all cases.  Since 
MacDevanagari is originally a single-byte encoding, this is still 
possible without bloating the UCM.

Dan the Encode Maintainer

[Encode] Farsi is Okay. The problem is in Indics!

Reply via email to