Re: Switching to UTF-8

Jungshik Shin Wed, 01 May 2002 22:57:54 -0700


On Thu, 2 May 2002, Tomohiro KUBOTA wrote:

> At Wed, 01 May 2002 20:02:57 +0100,
> Markus Kuhn wrote:
>
> > I have for some time now been using UTF-8 more frequently than
> > ISO 8859-1. The three critical milestones that still keep me from
> > moving entirely to UTF-8 are

> How about bash?  Do you know any improvement?

> Please note that tcsh have already supported east Asian EUC-like
> multibyte encodings.  I don't know it also supports UTF-8.

  It doesn't seem to support UTF-8 locale as of tcsh 6.10.0
(2000-11-19). I can't find anything about UTF-8 at http://www.tcsh.org.
The newest release is 6.11.0 The same is true of zsh.
(http://www.zsh.org)

>     combining characters?  bidi?  Arab shaping?  Indic scripts?
       and Hangul :-)
>     Mongol (which needs vertical direction)?  How about wcwidth()?

  Pango and ST should certainly help, here....

>  * input methods
>     Any way to input complex languages which cannot be supported
>     by xkb mechanism (i.e., CJK) ?  XIM? IIIMP? (How about Gnome2?)

  You mean IIIMF, didn't you? If there's any actual implementation,
I'd love to try it out. We need to have Windows 2k/XP or MacOS 9/X
style keyboard/IM switching mechanism/UI so that  keyboard/IM modules
targeted at/customized for each language can coexist and be brought up as
necessary. It appears that IIIMF seems to be the only way unless somebody
writes a gigantic one-fits-all XIM server for UTF-8 locale(s).

  How about just running your favorite XIM under ja_JP.EUC-JP while
all other applications are launched under ja_JP.UTF-8? As you know well,
it just works fine although the character repertoire you can enter
is limited to that of EUC-JP. Of course, this is not full-blown UTF-8
support, but at least it should give you the same degree of Japanese
input support under ja_JP.UTF-8 as under ja_JP.EUC-JP. Well, then
you would say what the point of moving to UTF-8 is. You can at least
display more characters  under UTF-8 than under EUC-JP, can't you? :-)

  In Korean case, as I wrote a couple of days ago, I had to
modify Ami (a popular Korean XIM) to make it run under ko_KR.UTF-8
because otherwise even though my applications are running under and
fully aware of UTF-8 (e.g. vim under UTF-8 xterm), I couldn't enter
over 8,000 Hangul syllables not in EUC-KR but in UTF-8.  Moreover,
under ko_KR.UTF-8, Xterm-16x and Vim 6.1 with a single line patch  works
almost flawlessly with U+1100 Hangul Jamos. Markus, can you update your
UTF-8 FAQ on this issue?  Xterm has been supporting Thai script and that
certainly brought in almost automagically Middle Korean support as
a by-product.

  BTW, Xkb may work for Korean Hangul, too and we don't need
XIM  if we use 'three-set keyboard' instead of 'two-set keyboard' and can
live without Hanjas.  I have to know more about Xkb to be certain, though.

>     Or, any software-specific input methods (like Emacs or Yudit)?

  Yudit supports Indic, Thai, Arabic pretty well as far as I know.
And, judging from what Gaspar wrote to me, Middle Korean support with
U+1100 jamo is not so far away. Most of what's necessary is firmly in
place because Gaspar has written a very generic complex script support
routines which hopefully can be used for Middle Korean without much
effort.

  Jungshik Shin

--
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/linux-utf8/
Re: Switching to UTF-8

Reply via email to