Haxe wrote: > On Monday 03 October 2005 23:33, Christian Biere wrote: > > > iconv_open() undstands "WCHAR_T" as encoding name.
> > That's specific to GNU iconv
> Has anyone tried that on other systems?
It doesn't work here. Actually, it's not documented on not so
recent Linux systems although "iconv --list" shows WCHAR_T.
> To my taste, the whole iconv
> extension to the C standard can only be called well-integrated if it
> supports the already-used abstract wchar_t.
I don't know whether wchar_t and the wide-char functions are used by a
lot of software, so far I hadn't come across it. iconv() is certainly
under-specified for practical use in POSIX. At the very least a
minimum set of standard character encoding names should be defined.
> If it doesn't, it's
> isolated crap. I think every system designer should see that and would
> thus implement supporting wchar_t directly in iconv.
The idea itself i.e., a portable possibility to convert from the
current locale to Unicode is certainly good. I'm not so sure about
this iconv() extension though. iconv() is specified as "char *" to
"char *" conversion routine, so converting to "wchar_t *" is asking
for alignment trouble. Furthermore, wchar_t simply is no encoding.
Thus if you want conversion to UCS-4, you should simply use that as
encoding name. The latter doesn't clash with any standard i.e.,
doesn't require special characteristics of wchar_t. So unless a
newer revision of POSIX resp. ISO C changes and simplifies the
wide-char features, I doubt that all vendors/developers will happily
jump on the iconv() "wchar_t" train.
Actually, the standard interface for converting from the current
locale to wchar_t should be mbrtowc(). The inverse is handled by
wctomb(). It seems to be horribly broken in many OS - at least older
versions - from what I've read. However, after this step you still
need a guarantee or a tool to convert the wchar_t string to UCS-4 or
UTF-8. So
iconv_open("char", "wchar_t")
is a red herring in my opinion. What we really want and need is
iconv_open("locale", SOME_UNICODE_ENCODING)
and further a way to detect or select the encoding used for wchar_t
as long as POSIX and C don't guarantee it's UCS-4.
--
Christian
pgpj8B0NypgvF.pgp
Description: PGP signature
