I remember someone has already brought this issue up several months ago. As I'm just a newbie in perl, I'm hesitating in reporting this. I'm not 100% sure if it is a bug. Anyway...

Conversion from MacArabic/Farsi/Hebrew does not seem to work properly.

I tried this with perl 5.8.1rc2 and 5.8.0 with Encode 1.97.

1. I created a test file which is a character table containing all 1-byte characters from \x20 to \xFF except \x7F.

2. On it, I run

perl -Mencoding=MacArabic,STDOUT,utf8 -pe1 < testfile.txt > /tmp/result.txt

3. Terminal showed

MacArabic "\x20" does not map to Unicode.
MacArabic "\x21" does not map to Unicode.
MacArabic "\x22" does not map to Unicode.
...

4. In result.txt, those characters, i.e. \x20-\x2F, \x3A-\x3F, \x5B-\x5F, \x7B-\x7D, \x81, \x8C, \x93, \x98, \x9B, \xA0-\xA4, \xA6-\xAB, \xAD-\xBA, \xBC-\xBE, \xC0, \xDB-\xDF, \xFB-\xFD have not been converted to appropriate characters but changed to hexadecimal notation.

I think those characters should be converted properly since they are mapped in
<ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ARABIC.TXT>


Similar results with conversion from MacFarsi and MacHebrew.


TextEdit displays the test file normally if you have specified MacArabic for plain text encoding when opening it. Except that *sometimes* digits U+0030-U+0039 would be replaced with ARABIC-INDIC DIGIT U+0660-U+0669. I don't know exactly how this occurs. But I just saw digits in the first line of my test file "<tab>0<tab>1<tab>2...<tab>F" changed to Aragic-Indic digits.



Cyclone -- GUI for TextEncodingConvertor -- seems to work flawlessly. <http://free.abracode.com/cyclone/>


Kino








Reply via email to