[Encode] How to support (Apple's) compound Unicode characters?

Dan Kogai Fri, 29 Mar 2002 23:43:37 -0800

On Saturday, March 30, 2002, at 03:24 , Dan Kogai wrote:
>   Okay.  I've checked
>
> http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/
>
>   One more time and it seems that other missing encodings are available 
> as well, such as korean. I'll look into that.


   I think I have found the reason why some of the encodings were missing 
from Tcl's *.enc, which later turned into *.ucm.
   Apple makes use of Unicode compound characters too extensively, which 
doesn't go well with .ucm, not to mention *.enc

http://www.unicode.org/Public/MAPPINGS/VENDORS/APPLE/JAPANESE.TXT
> # Apple additions - vertical forms
> 0xEB41  0x3001+0xF87E   # vertical form for IDEOGRAPHIC COMMA
   ^^^^^^Mac Japanese, then Unicode Character
Encode/macJapan.ucm
> <UF8B5> \xEB\x41 |0 # Private Use

   So they are already conflicting.  While MacJapanese doesn't have many, 
MacKorean does have lots of them.  No wonder it is not listed on Tcl.

   I wonder which one I should trust but I have reasons to believe Apple 
is still considering the map @ unicode.org canonical.  Take HFS+, for 
example.  The word 'Hangul' consists of two syllables, two characters in 
KSC5601 (han-gul).  But on HFS+, it is broken up to h-a-n-g-u-l.
   Though it is possible to mangle enc2xs to make such mappings (it can 
handle, in theory, any nbyte-nbyte conversion), the UCM format does not 
seem to be designed that way.
   Hmm.... Let me think about it for a while...  Well, it's only vendor 
mapping and Encode support has already matched that of major browsers.   
So it is already practical enough and I believe the level of support is 
good enough for 5.8.0.  Maybe those vendor mapping that are missing be 
diverted to Encode::Vendors::(Apple|MS) or something....

Dan the Encode Maintainer

[Encode] How to support (Apple's) compound Unicode characters?

Reply via email to