On Tue, Mar 26, 2002 at 07:16:01PM -0500, Jungshik Shin wrote: > BTW, I don't find any reference to Microsoft code pages > (CP949 for Korean, CP950, CP 936 , and CP932), JOHAB(Korean), and > Big5-HKSCS Is that because they're not yet supported (well, Shift-JIS > and Big5 are supported)?
CP949 is there in Encode::KR. CP950 is in Encode::TW. CP936 is in Encode::CN. CP932 is in Encode::JP. I've put Big5-HKSCS into Encode::TW, which is later renamed to big5-hk.ucm by Dan. I don't think it's a good idea, though... Dan, could you explain the reason? > > As a result, something funny has happed. For example, U+673A means "a > > machine" in Simplified Chinese but "a desk" in Japanese. "a machine" > > in Japanese. U+6A5F. > > Do you really believe this is a strong case against Han Unification? > I don't see any problem with this. There are a number of > Chinese characters with multiple meanings even without Han > Unification. Do those 'meanings' have to be assigned separate > code points? Dan probably thinks that U+673A in Simplified Chinese Script and Japanese/ Traditional Chinese Script should be assigned two different code points. Unicode does have a distinction between "Modifier Letter Prime" and "Prime", which is by their usage (letter/symbol) despite they share the same appearance. > > So you can't tell what it means just by looking at the code. > Why does coded character set have to care about what computational > linguists have to do? You can't tell the meaning of > any English word with multiple meanings by just looking at > its computer representation without context/grammatical/linguistic/lexical > analysis, can you? How do you know what 'fly' means without context? How about "So you can't tell which Script it means just by looking at the code"? /Autrijus/
msg00934/pgp00000.pgp
Description: PGP signature