Tomohiro KUBOTA wrote:
> 
> At Mon, 2 Apr 2001 23:44:56 +0200 (CEST),
> Bruno Haible <[EMAIL PROTECTED]> wrote:
> > Be careful to verify each character when you do this. The Big5+
> > from/to Unicode mapping tables contain mistakes and are inconsistent
> > with themselves.
> 
> Sure.  I recently noticed a severe problem on conversion table
> between Unicode and Japanese encodings supplied by Unicode consortium.

I think Bruno is referring to the Big5+ <-> Unicode mapping tables
supplied by CMEX (http://www.cmex.org.tw/), and not the Unicode
Consortium.  The Unicode Consortium does supply some mapping tables
(unfortunately, sometimes slightly different versions of the same
thing), but it seems to just be a service, and not normative--although
some people do make it effectively so by downloading their tables
from there.

It is still, however, important to be careful about just swiping a
mapping table (or anything else) from somewhere, especially when
one doesn't understand the contents enough to properly evaluate the
quality of the data.  I'm sure you've seen software by Chinese
authors that purport to be usable for Japanese, but are in fact
useless because of some flaw, like omitting ISO 2022-JP support while
including Shift-JIS and EUC-JP.


> I think you are noticed by Sakamoto about the problem.
> For example, EUC-JP is an _encoding_ (in other words, CES) which
> includes ASCII and JISX0208 as _character sets_ (in other words,
> CCS).  If we define EUC-JP <-> Unicode conversion using
> ASCII <-> Unicode conversion (i.e., no conversion) and the
> Unicode consortium's ASCII <-> JISX0208 conversion table,
> round-trip compatibility is lost.  In other words, Unicode
> consortium's ASCII <-> JISX0208 table uses U+005c, which is of
> course included in ASCII.  I don't know what is the best solution.

That's the half-width backslash vs. half-width yen sign problem,
isn't it?  Correct me if I'm wrong, but isn't EUC-JP actually
composed of 1) ISO 646-JP (has yen sign) and JIS X 0208, rather
than 2) ISO 646-IRV (has backslash) and JIS X 0208?  (Of course,
EUC-JP can also have JIS X 0212 included too.)


Thomas Chan
[EMAIL PROTECTED]
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to