Re: iconv

Bruno Haible Wed, 06 Sep 2000 05:12:36 -0700
Marcin 'Qrczak' Kowalczyk writes:

> > For the conversions you can use iconv() and a normalizing wrapper
> > around nl_langinfo(CODESET).
> 
> I don't like the idea of finding and interpreting locale.aliases by
> applications themselves...

It's not the locale aliases, it's the charset aliases. That makes a
difference, because there are much fewer of them, and because many
OSes actually do support the standardized MIME names with few
exceptions.

> > In glibc 2.1.93 it does: use iconv with "wchar_t" argument. It also
> > knows about "UCS-4" and "UCS-4LE" encodings.
> 
> Good, so there is a chance that future iconv will be more usable?

It will. glibc has now a testsuite for iconv.

> (it's a development version of glibc, isn't it? so it's still future
> for me).

You can install it anyway, either yourself (instructions at
http://clisp.cons.org/~haible/glibc22-HOWTO.html) or get the beta of
the next RedHat distribution.

> > Which limitations does the portable iconv substitute (libiconv) have?
> 
> That it must be carried along with a package - it's not a tiny wrapper
> around what the OS+std.libc provide but the whole implementation
> from scratch.

You can distribute it as a separate file, and let people on systems
with insufficient iconv() install it before your package.

> And that there is no nice way to determine either the name of the
> default local encoding or a known encoding of Unicode (for iconv in
> general). It all looks like kludges and guessing...

Nice or not - nl_langinfo(CODESET) plus a bit of postprocessing works
on most modern systems. And the encoding name "UTF-8" is known
everywhere.

> How to determine the quality of a locally installed iconv? For example
> I don't consider that in glibc-2.1.3 usable - recently I've seen
> "../iconv/skeleton.c:324: __gconv_transform_utf8_internal: Assertion
> `nstatus == GCONV_FULL_OUTPUT' failed.", there are several other
> errors, checking for illegal UTF-8 is poor.

These bugs are fixed in current glibc.

Solaris iconv is also quite usable, but its checking for invalid input
is poor.

> How can an iconv implementation be portable if it has to know all
> charsets that are used on all OSes? What worries me is that it must
> do everything itself. If an OS provides an unusual charset, libiconv
> will not see it.

Then someone will hopefully report it to me, and I will add that
unusual charset. Btw, can someone provide conversion tables for
HP-UX's "ccdc" or Solaris' "sun_eu_greek" encodings?

> How do Java implementations find this locale dependent default value?
> Do they use e.g. iconv for the actual conversion? Or determine only the
> name of the encoding somehow and implement the conversion themselves?

The Sun JDK has a documented set of encoding names
(http://www.javasoft.com:80/products/jdk/1.1/docs/guide/intl/encoding.doc.html)
and implements the conversion in Java.

> What about Perl and Python?

Python implements the conversions in Python. Don't know about Perl.

Bruno
-
Linux-UTF8:   i18n of Linux on all levels
Archive:      http://mail.nl.linux.org/lists/
Re: iconv

Reply via email to