Re: CJK Unification

Tomohiro KUBOTA Mon, 02 Apr 2001 19:31:09 -0700
Hi,

At Mon, 2 Apr 2001 23:44:56 +0200 (CEST),
Bruno Haible <[EMAIL PROTECTED]> wrote:

> Be careful to verify each character when you do this. The Big5+
> from/to Unicode mapping tables contain mistakes and are inconsistent
> with themselves.

> Be careful to verify each character when you do this. The Big5+
> from/to Unicode mapping tables contain mistakes and are inconsistent
> with themselves.

Sure.  I recently noticed a severe problem on conversion table
between Unicode and Japanese encodings supplied by Unicode consortium.
I think you are noticed by Sakamoto about the problem.
For example, EUC-JP is an _encoding_ (in other words, CES) which
includes ASCII and JISX0208 as _character sets_ (in other words,
CCS).  If we define EUC-JP <-> Unicode conversion using
ASCII <-> Unicode conversion (i.e., no conversion) and the
Unicode consortium's ASCII <-> JISX0208 conversion table,
round-trip compatibility is lost.  In other words, Unicode
consortium's ASCII <-> JISX0208 table uses U+005c, which is of
course included in ASCII.  I don't know what is the best solution.
However, at least I can say that EUC-JP is an encoding with long
history and past records of vast usage.  It is proved that EUC-JP
has no fatal problem.  Otherwise EUC-JP would not be used so widely.
Thus it is apparent what is blamed is Unicode or its conversion
tables.

Thus, conversion between Unicode and other encodings must be
brushed-up before it is widely used in CJK world, because once
Unicode would be widely diffused, it would be very difficult to
change the conversion table (conversion table itself is a standard.
once a standard is diffused, it is difficult to change it).  Now,
Unicode is rarely used as _external encoding_ (i.e., encoding
for data exchange) in Japan, though Java and so on are widely
used in Japan (they use Unicode as _inernal encoding_).

Such a work (brushing up conversion tables) is too heavy for
voluntary individuals.  Do you know someone is working on this field
or at least whether Unicode people are noticed with this problem?
(I hope some Unicode advocates know well on this problem.)

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: CJK Unification

Reply via email to