Re: ASCII and JIS X 0201 Roman - the backslash problem

2002-05-09 Thread Tomohiro KUBOTA

Hi,

At Wed, 8 May 2002 21:41:08 +0200 (CEST),
Bruno Haible wrote:

> Tomohiro Kubota, in
> http://www.debian.or.jp/~kubota/unicode-symbols-yen.html, explains
> the YEN SIGN versus REVERSE SOLIDUS problem.  He writes:

I think there is no solution for this problem.  At least, this
problem should not be solved until massive Japanese people will
start using Unicode.  When massive Japanese people will start
using Unicode and they will establish a custom how to handle this
character code, we will be able to think about the solution.

Though Yen Sign Problem did exist in the past (because Shift_JIS's
0x5c is YEN SIGN and EUC-JP's 0x5c is REVERSE SOLIDUS), this problem
was not regarded as a large severe problem because Shift_JIS is the
only popular encoding for general users in Japan because it is
adopted by Windows and Macintosh.  EUC-JP is only used for UNIX-like
systems which is not yet common even now.

I think the "unicode" which is used in Japanese version of Windows
is *not* unicode.  It is a Unicode-like encoding with 0x005c YEN
SIGN.  However, I don't think this cannot solve our problem because
Microsoft will call the "Unicode-like encoding" as Unicode and there
are no reason why Japanese people don't believe it.



> 1) Admit that YEN SIGN and REVERSE SOLIDUS are different things.

Yes, of course.  "regard YEN SIGN and REVERSE SOLIDUS are same"
was possible because Shift_JIS has YEN SIGN but no REVERSE SOLIDUS
while EUC-JP has REVERSE SOLIDUS but no YEN SIGN.


> 2) Never use backslash as a directory separator.

Since Shift_JIS doesn't have backslash, this is not a problem.


> 3) For programs that interpret backslash as some kind of escape character
>and use Unicode internally but should work with text in Shift_JIS
>encoding, consider the multibyte character 0x5C as being the escape
>trigger, not [only] the Unicode character U+005C. This is already done
>in bash and gettext. For example, in GNU gettext, we have the code

Though I could not understand the code, I think interpretation of
U+00A5 as an additional escape character doesn't always work, because
Unicode texts don't have information on their origin (converted from
Shift_JIS or not).  If U+00A5 would always be an escape character,
it would be harmful for many softwares.


> 4) When people convert files from Shift_JIS to Unicode, they need to
>disambiguate the two uses of the character that Tomohiro mentions:
>"When a Japanese person is a writer, it means YEN SIGN in most cases.
> When a non-Japanese person is a writer, it always means REVERSE SOLIDUS."
>These "most cases" need to be distinguished - in a financial text the
>use is likely different than in a shell script. It can not be done
>by the iconv program.

The problem is, the distinction cannot be automated and costs a lot.
Most of Japanese people don't have any reason to pay such costs.


I am interested in how European people succeeded to migrate from ISO 646
variants into ISO 8859.  Yen Sign Problem is exactly a problem of ISO 646,
because "0x5c = YEN SIGN" comes from JIS X 0201 Roman, which is Japanese
variant of ISO 646.


---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://www.debian.or.jp/~kubota/
"Introduction to I18N"  http://www.debian.org/doc/manuals/intro-i18n/
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Switching to UTF-8 and Gnome 1.2.x

2002-05-09 Thread Jungshik Shin


Hi,

In my transition to UTF-8, I found that Gnome 1.2.x has a lot of files
in mixed encodings. All *.desktop files and .directory files are in
mixed encodings. Entries for [ja] are in EUC-JP, entries for [de] are
in ISO-8859-1/15 and entries for [ru] are in KOI8-R and so on. On the
other hand, corresponding KDE files are all in UTF-8 so that I don't
need to change anything there. Anyway, thanks to Encoding module (to be
included in upcoming Perl 5.8 by default), I was able to write a simple
script to add ko_KR.UTF-8 entries for all [ko] entries in EUC-KR
in *desktop files and .directory files. Below is the list of
directories I have to run my script on:

/usr/share/apps
/usr/share/applets
/usr/share/applnk
/etc/X11/applnk
/usr/share/mc
$HOME/.gnome

Still, I got gibberish in Gnome tip of the day. It turned out that gnome
hint files (usually installed in /usr/share/gnome/hints) are Xml files
in mixed encodings. I don't think they're compliant to Xml standard
because I've never heard of Xml files in mixed encodings. So, I also
had to add ko_KR.UTF-8 entries for all [ko] entries. Even with this,
for some reason unknown to me, whenever I cross the 'boundary'(i.e.
from the last to the first or the other way around), I got gibberish.

Two other  places where languages are tied to encodings are
Gnome help (usually in /usr/share/gnome/help) and Gimp tips
(/usr/(local/)share/gimp/$version/tips/gimp_tips.[lang].txt) I also had
to make UTF-8 version of them.

I believe all these problems have been addressed in Gnome 2.0(RC?/beta),
but still Gnome 1.x are widely used. I thought my experience would
help others who want to move on to UTF-8 as well as distribution
builders.

  Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/




Keep it Up ! wqve

2002-05-09 Thread loySpecial Offer 30 Day Free Trial
Title: ViaPro





The following advertisement is being sponsored by 
AVIRTUALSHOPPER.COM
The Internets Leading source for  permissioned based opt-in marketing
To Opt-Out from our mailing list CLICK HERE
...



















evfkbrxyawuftgvdkrdktteymmmjb
--
Linux-UTF8:   i18n of Linux on all levels
Archive:  http://mail.nl.linux.org/linux-utf8/