Hi,
At Mon, 02 Apr 2001 23:42:58 -0400,
Thomas Chan <[EMAIL PROTECTED]> wrote:
> I think Bruno is referring to the Big5+ <-> Unicode mapping tables
> supplied by CMEX (http://www.cmex.org.tw/), and not the Unicode
> Consortium. The Unicode Consortium does supply some mapping tables
> (unfortunately, sometimes slightly different versions of the same
> thing), but it seems to just be a service, and not normative--although
> some people do make it effectively so by downloading their tables
> from there.
>
> It is still, however, important to be careful about just swiping a
> mapping table (or anything else) from somewhere, especially when
> one doesn't understand the contents enough to properly evaluate the
> quality of the data. I'm sure you've seen software by Chinese
> authors that purport to be usable for Japanese, but are in fact
> useless because of some flaw, like omitting ISO 2022-JP support while
> including Shift-JIS and EUC-JP.
In ideal, we should have one common conversion table for each conversion,
for example, EUC-JP <-> Unicode. Otherwise, round-trip conversion will
be a vaporware. Imagine that Taro has a EUC-JP text and sends it
by e-mail using UTF-8 encoding. Hanako receives the text and re-convert
it into EUC-JP. If Taro's conversion table uses a codepoint which
Hanako's conversion table does not use, the character will be broken.
However, such a bad situation will be the reality, because major
vendors like MS and Sun use different conversion tables. I hope
at least open source community will use a consistent conversion
table.
> That's the half-width backslash vs. half-width yen sign problem,
> isn't it?
Sure. ah, not exactly. EUC-JP has "backslash" problem.
Here is a conversion table between Unicode and JIS X 0208:
http://www.unicode.org/Public/MAPPINGS/EASTASIA/JIS/JIS0208.TXT
and 0x815F (in Shift_JIS) (or 0x2140 in JIS X 0208) is mapped into
U+005C. However, EUC-JP is a CES whose CCS are ISO 646-IRV (or
US-ASCII) and JIS X 0208.
On the other hand, Shift_JIS is a CES whose CCS are JIS X 0201
Roman (ISO 646-JP), JIS X 0201 Kana, and JIS X 0208.
> Correct me if I'm wrong, but isn't EUC-JP actually
> composed of 1) ISO 646-JP (has yen sign) and JIS X 0208, rather
> than 2) ISO 646-IRV (has backslash) and JIS X 0208? (Of course,
> EUC-JP can also have JIS X 0212 included too.)
EUC-JP uses ISO 646-IRV + JIS X 0208 (+ JIS X 0201 Kana + JIS X 0212).
Shift_JIS uses ISO 646-JP + JIS X 0208.
FYI: ISO-2022-JP uses both ISO 646-IRV and ISO 646-JP, though
I never met implementation of ISO-2022-JP which distinguish ISO
646-IRV and ISO 646-JP.
---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/