Re: Unicode. Perl does the right thing?

Jungshik Shin Fri, 25 Oct 2002 11:58:01 -0700

On Fri, 25 Oct 2002, Autrijus Tang wrote:

> On Fri, Oct 25, 2002 at 02:53:43PM +0900, Dan Kogai wrote:
> > use charanames ":zh";
> > print "\N{sheng1}";
>
> 17 characters from the Big5 range has the 'sheng1' pronounciation;
> no doubt many more in the Unihan range.

> > use charanames ":zh";
> > print "\N{saeng}";

  Needless to say, there are many CJK characters with the Korean
pronunciation 'saeng', let alone  a Korean Hangul syllable with that
pronunciation. Besides, there are some characters with multiple readings.
So, this doesn't work for Korean, either.

> This "internal code of Han characters" has been discussed in depth
> here by Mr Zhu Bang-Fu and friends; the consensus is that there's
> no way to uniquely identify one character from another depending
> only on a single 'natural' index (Cang-Jie, pinyin, etc) -- you
> will end up with fixed ordering ("\N{sheng1-0001}") instead, which
> is not more legible than "\x{751f}".

  In a sense, it's even worse than "\x{751f}" unless there's a
machine-readable mapping table (as well as  printed human readable)
from sheng1-NNNN's to Unicode code points. Otherwise, one  would
have  to refer  to the Unicode code chart anyway.

  How about radical-stroke-pronunciation index? Even with this
triple index system, there may be degeneracies to lift....

  Another possibility is 'meaning-pronunciation' index. I believe
this is one of a few ways to refer to CJK characters (say, over the phone)
in all CJK countries. However, to do this, we need much more raw data
(more or less like a small dictionary) than UniHan DB provides because
it lists meanings of characters in English only.


  Jungshik
Re: Unicode. Perl does the right thing?

Reply via email to