Hi,

At Mon, 02 Apr 2001 23:42:58 -0400,
Thomas Chan <[EMAIL PROTECTED]> wrote:

> I think Bruno is referring to the Big5+ <-> Unicode mapping tables
> supplied by CMEX (http://www.cmex.org.tw/), and not the Unicode
> Consortium.  The Unicode Consortium does supply some mapping tables
> (unfortunately, sometimes slightly different versions of the same
> thing), but it seems to just be a service, and not normative--although
> some people do make it effectively so by downloading their tables
> from there.
> 
> It is still, however, important to be careful about just swiping a
> mapping table (or anything else) from somewhere, especially when
> one doesn't understand the contents enough to properly evaluate the
> quality of the data.  I'm sure you've seen software by Chinese
> authors that purport to be usable for Japanese, but are in fact
> useless because of some flaw, like omitting ISO 2022-JP support while
> including Shift-JIS and EUC-JP.

In ideal, we should have one common conversion table for each conversion,
for example, EUC-JP <-> Unicode.  Otherwise, round-trip conversion will
be a vaporware.  Imagine that Taro has a EUC-JP text and sends it
by e-mail using UTF-8 encoding.  Hanako receives the text and re-convert
it into EUC-JP.  If Taro's conversion table uses a codepoint which
Hanako's conversion table does not use, the character will be broken.

However, such a bad situation will be the reality, because major
vendors like MS and Sun use different conversion tables.  I hope
at least open source community will use a consistent conversion
table.


> That's the half-width backslash vs. half-width yen sign problem,
> isn't it?

Sure.  ah, not exactly.  EUC-JP has "backslash" problem.
Here is a conversion table between Unicode and JIS X 0208:
  http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT
and 0x815F (in Shift_JIS) (or 0x2140 in JIS X 0208) is mapped into
U+005C.  However, EUC-JP is a CES whose CCS are ISO 646-IRV (or
US-ASCII) and JIS X 0208. 

On the other hand, Shift_JIS is a CES whose CCS are JIS X 0201
Roman (ISO 646-JP), JIS X 0201 Kana, and JIS X 0208.


> Correct me if I'm wrong, but isn't EUC-JP actually
> composed of 1) ISO 646-JP (has yen sign) and JIS X 0208, rather
> than 2) ISO 646-IRV (has backslash) and JIS X 0208?  (Of course,
> EUC-JP can also have JIS X 0212 included too.)

EUC-JP uses ISO 646-IRV + JIS X 0208 (+ JIS X 0201 Kana + JIS X 0212).
Shift_JIS uses ISO 646-JP + JIS X 0208.

FYI: ISO-2022-JP uses both ISO 646-IRV and ISO 646-JP, though
I never met implementation of ISO-2022-JP which distinguish ISO
646-IRV and ISO 646-JP.

---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/

Reply via email to