Re: Encode: CJK-Guide

Autrijus Tang Tue, 26 Mar 2002 20:16:15 -0800

On Tue, Mar 26, 2002 at 07:16:01PM -0500, Jungshik Shin wrote:
>   BTW, I don't find any reference to Microsoft code pages
> (CP949 for Korean, CP950, CP 936 , and CP932), JOHAB(Korean), and 
> Big5-HKSCS Is that because they're not yet supported (well, Shift-JIS 
> and Big5 are supported)?


CP949 is there in Encode::KR. CP950 is in Encode::TW. CP936 is in
Encode::CN. CP932 is in Encode::JP.

I've put Big5-HKSCS into Encode::TW, which is later renamed to
big5-hk.ucm by Dan. I don't think it's a good idea, though...
Dan, could you explain the reason?

> > As a result, something funny has happed.  For example, U+673A means "a
> > machine" in Simplified Chinese but "a desk" in Japanese.  "a machine"
> > in Japanese.  U+6A5F.  
> 
>   Do you really believe this is a strong case against Han Unification?
> I don't see any problem with this.  There are a number of
> Chinese characters with multiple meanings  even without Han
> Unification. Do those 'meanings' have to be assigned separate
> code points? 

Dan probably thinks that U+673A in Simplified Chinese Script and Japanese/
Traditional Chinese Script should be assigned two different code points.

Unicode does have a distinction between "Modifier Letter Prime" and "Prime",
which is by their usage (letter/symbol) despite they share the same appearance.

> > So you can't tell what it means just by looking at the code.
>   Why does coded character set have to care about what computational
> linguists have to do? You can't tell the meaning of 
> any English word with multiple meanings by just looking at
> its computer representation without context/grammatical/linguistic/lexical
> analysis, can you? How do you know what 'fly' means without context? 

How about "So you can't tell which Script it means just by looking at the code"?

/Autrijus/

msg00934/pgp00000.pgp
Description: PGP signature

Re: Encode: CJK-Guide

Reply via email to