Hi,
At Fri, 26 Jan 2001 12:18:03 +0100,
Bram Moolenaar <[EMAIL PROTECTED]> wrote:
> > Sure. Use setlocale(LC_CTYPE,""), nl_langinfo(CODESET), and iconv().
> > (You may want to setlocale(LC_ALL,"") and then you don't need
> > setlocale(LC_CTYPE,"") .)
>
> I'm already doing setlocal(LC_CTYPE, ""). I don't have nl_langinfo(), I
> suspect others also will not have it (I'm using FreeBSD, don't forget that Vim
> has to be very portable).
Hmm, it is too bad... I hope *BSD will soon have nl_langinfo().
Since it is open-source, you can do that. This work is much more
important than implementing UTF-8 support for XIM softwares.
However, I agree that someone may want to use Vim under some sucking
proprietary OS. How about implementing nl_langinfo(CODESET), like
Markus does for wcwidth() ? It would be like following:
#define DEFAULT_ENCODING "ISO-8859-1"
char *my_nl_langnfo_charset(void)
{
static char *encodings[] = {
/* this may be constructed by ./configure script. */
"UTF-8", "ISO-8859-1", "ISO-8859-2", "EUC-JP", "EUC-KR",
"GB2312", "GB18030", "KOI8-R", "TIS620", ..., NULL
};
static struct {char *locale; char *encoding;} locales[] = {
/* this can include all possible locale names as faw as we know. */
{"en_US", "ISO-8859-1"},
{"ja_JP.eucJP", "EUC-JP"},
{"ja_JP.ujis", "EUC-JP"},
{"ja_JP.EUC", "EUC-JP"},
{"ja_JP.SJIS", "Shift_JIS"},
{"ko_KR.eucKR", "EUC-KR"},
...
{NULL, NULL}
};
char *nowlocale = setlocale(LC_CTYPE, NULL);
char *ret;
int j;
if (nowlocale == NULL)
if ((nowlocale=getenv("LC_ALL")) == NULL)
if ((nowlocale=getenv("LC_CTYPE")) == NULL)
if ((nowlocale=getenv("LANG")) == NULL)
return DEFAULT_ENCODING;
for(j=0; encodings[j]!=NULL; j++) {
if (strstr(nowlocale, encodings[j])) {
return locales[j].encodings;
}
}
for(j=0; locales[j].locale!=NULL; j++) {
if (!strcasecmp(nowlocale, locales[j].locale)) {
return locales[j].encoding;
}
}
return DEFAULT_ENCODING;
}
Use such a function if system doesn't support nl_langinfo().
(I wrote this function now directly to this mail. I didn't
test this function at all.)
> You mean that they are using software that's not maintainable? Then you have
> a problem anyway.
Some people dare to do that. However, well, I agree with you.
> My point was that it should be possible to make an UTF-8 XIM which uses
> exactly the same grammer-analizer as the euc-jp one. If you have the source
> code of the XIM that might not be so difficult, since you can leave the
> grammer-analizer unmodified. If you don't have the source code, it's about
> time someone makes an open-source XIM!
There are a few open-source XIM. However, as I said, Japanese input
engine is very complex software and shrink-wrap Japanese input engines
are sold. Even many Linux users buy them.
Yes, I think we can modify open-source XIM softwares to support UTF-8
in UTF-8 locale. However, this is what XIM developers should think.
This is not what application software developers should think. What
application software developers should think is to make their softwares
work following LC_CTYPE locale (both of UTF-8 and non-UTF8), so that
users can choose their preferable encodings.
> > I recommend to use setlocale() and nl_langinfo(), as I wrote above.
> > If your software directly connects with XIM, XmbLookupString() will
> > give you input string in locale encoding (i.e., encoding specified
> > by LC_CTYPE locale by $LC_ALL, $LC_CTYPE, and $LANG variables; the
> > encoding can be obtained by softwares by using nl_langinfo(CODESET)).
>
> That's a problem. It's very well possible that $LANG has been set to use
> UTF-8. Then what will the XIM do? Does it still produce the same euc-jp
> encoding, will it produce UTF-8 or will it fail? All items you mention are
> global to the process. I need to find out what the XIM is using, not what my
> process is using.
I think the user knows UTF-8 locale is not available yet.
XIM _should_ use locale encoding (Of course it means UTF-8 in UTF-8
locale). (If a particular XIM does not obey locale encoding, it is
a bug of the XIM. VIM should not deal with such a bug.) Thus, your
need to find out what the XIM is using can be translated into finding
out what the locale encoding is. The most standard way is to use
nl_langinfo(CODESET). However, for OS which don't support it,
nl_langinfo-emulator which I wrote above might be used.
> Also note the difference between what a XIM and associated functions might be
> able to do in the future, and what is possible now. Preferably Vim will work
> well on a system that doesn't have the latest version of X windows and
> libraries.
XIM is available above X11R6. Please don't think about previous
version of X Window.
If you really need to run Vim on very old system, you should use
configure script and #ifdef to disable I18N functions.
However, I want to say the following:
> You mean that they are using software that's not maintainable? Then you have
> a problem anyway.
This is just what you said about using old XIM software. :-p
---
Tomohiro KUBOTA <[EMAIL PROTECTED]>
http://surfchem0.riken.go.jp/~kubota/
"Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/
-
Linux-UTF8: i18n of Linux on all levels
Archive: http://mail.nl.linux.org/lists/