Re: X input methods for utf-8?

Tomohiro KUBOTA Fri, 26 Jan 2001 05:07:51 -0800
Hi,

At Fri, 26 Jan 2001 12:18:03 +0100,
Bram Moolenaar <[EMAIL PROTECTED]> wrote:

> > Sure.  Use setlocale(LC_CTYPE,""), nl_langinfo(CODESET), and iconv().
> > (You may want to setlocale(LC_ALL,"") and then you don't need
> > setlocale(LC_CTYPE,"") .)
> 
> I'm already doing setlocal(LC_CTYPE, "").  I don't have nl_langinfo(), I
> suspect others also will not have it (I'm using FreeBSD, don't forget that Vim
> has to be very portable). 

Hmm, it is too bad...  I hope *BSD will soon have nl_langinfo().
Since it is open-source, you can do that.  This work is much more
important than implementing UTF-8 support for XIM softwares.
However, I agree that someone may want to use Vim under some sucking
proprietary OS.  How about implementing nl_langinfo(CODESET), like
Markus does for wcwidth() ?  It would be like following:


#define DEFAULT_ENCODING "ISO-8859-1"

char *my_nl_langnfo_charset(void)
{
  static char *encodings[] = {
    /* this may be constructed by ./configure script. */
    "UTF-8", "ISO-8859-1", "ISO-8859-2", "EUC-JP", "EUC-KR", 
    "GB2312", "GB18030", "KOI8-R", "TIS620", ..., NULL
  };
  static struct {char *locale; char *encoding;} locales[] = {
    /* this can include all possible locale names as faw as we know. */
    {"en_US",       "ISO-8859-1"},
    {"ja_JP.eucJP", "EUC-JP"},
    {"ja_JP.ujis",  "EUC-JP"},
    {"ja_JP.EUC",   "EUC-JP"},
    {"ja_JP.SJIS",  "Shift_JIS"},
    {"ko_KR.eucKR", "EUC-KR"},
    ...
    {NULL, NULL}
  };
  char *nowlocale = setlocale(LC_CTYPE, NULL);
  char *ret;
  int j;

  if (nowlocale == NULL)
    if ((nowlocale=getenv("LC_ALL")) == NULL)
      if ((nowlocale=getenv("LC_CTYPE")) == NULL)
        if ((nowlocale=getenv("LANG")) == NULL)
          return DEFAULT_ENCODING;

  for(j=0; encodings[j]!=NULL; j++) {
    if (strstr(nowlocale, encodings[j])) {
      return locales[j].encodings;
    }
  }

  for(j=0; locales[j].locale!=NULL; j++) {
    if (!strcasecmp(nowlocale, locales[j].locale)) {
      return locales[j].encoding;
    }
  }
  return DEFAULT_ENCODING;
}


Use such a function if system doesn't support nl_langinfo().
(I wrote this function now directly to this mail.  I didn't
test this function at all.)


> You mean that they are using software that's not maintainable?  Then you have
> a problem anyway.

Some people dare to do that.  However, well, I agree with you.


> My point was that it should be possible to make an UTF-8 XIM which uses
> exactly the same grammer-analizer as the euc-jp one.  If you have the source
> code of the XIM that might not be so difficult, since you can leave the
> grammer-analizer unmodified.  If you don't have the source code, it's about
> time someone makes an open-source XIM!

There are a few open-source XIM.  However, as I said, Japanese input
engine is very complex software and shrink-wrap Japanese input engines
are sold.  Even many Linux users buy them.

Yes, I think we can modify open-source XIM softwares to support UTF-8
in UTF-8 locale.  However, this is what XIM developers should think.
This is not what application software developers should think.  What
application software developers should think is to make their softwares
work following LC_CTYPE locale (both of UTF-8 and non-UTF8), so that
users can choose their preferable encodings.


> > I recommend to use setlocale() and nl_langinfo(), as I wrote above.
> > If your software directly connects with XIM, XmbLookupString() will
> > give you input string in locale encoding (i.e., encoding specified
> > by LC_CTYPE locale by $LC_ALL, $LC_CTYPE, and $LANG variables; the
> > encoding can be obtained by softwares by using nl_langinfo(CODESET)).
> 
> That's a problem.  It's very well possible that $LANG has been set to use
> UTF-8.  Then what will the XIM do?  Does it still produce the same euc-jp
> encoding, will it produce UTF-8 or will it fail?  All items you mention are
> global to the process.  I need to find out what the XIM is using, not what my
> process is using.

I think the user knows UTF-8 locale is not available yet.

XIM _should_ use locale encoding (Of course it means UTF-8 in UTF-8
locale).  (If a particular XIM does not obey locale encoding, it is
a bug of the XIM.  VIM should not deal with such a bug.)  Thus, your
need to find out what the XIM is using can be translated into finding
out what the locale encoding is.  The most standard way is to use
nl_langinfo(CODESET).  However, for OS which don't support it,
nl_langinfo-emulator which I wrote above might be used.


> Also note the difference between what a XIM and associated functions might be
> able to do in the future, and what is possible now.  Preferably Vim will work
> well on a system that doesn't have the latest version of X windows and
> libraries.

XIM is available above X11R6.  Please don't think about previous
version of X Window.

If you really need to run Vim on very old system, you should use
configure script and #ifdef to disable I18N functions.

However, I want to say the following:

> You mean that they are using software that's not maintainable?  Then you have
> a problem anyway.

This is just what you said about using old XIM software. :-p


---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: X input methods for utf-8?

Reply via email to