On 14/12/2006, at 1:15 AM, Andre Pang wrote:
I have a C string (char*) that's encoded in UTF-8. I'd like to
convert this to a wide string (wchar_t*). I've done plenty of
reading about mbstowcs(3), iconv(3) and friends, and from what I
understand, I have two options:
...
So far, I've tried (2) -- the iconv() method -- and it doesn't work
for me. It seems to work fine if the characters are ASCII, but the
moment it actually hits any non-ASCII characters, iconv() throws a
return code of -1 and errno's set to EILSEQ. I'm assuming there
are some bugs in my code, which is no surprise considering how
annoying iconv() is to use.
Well, this was slightly bizarre. I changed the destination encoding
from:
const iconv_t utf8ToWCharTIconvDescriptor = iconv_open("WCHAR_T",
"UTF-8");
to
const iconv_t utf8ToWCharTIconvDescriptor = iconv_open("UCS-4-
INTERNAL", "UTF-8");
And then iconv() did its job merrily. (I even put in a setlocale
(LC_CTYPE, "") before trying to use "WCHAR_T".)
I realise this means that I'm relying on wchar_t being UCS-4, but
I've got #ifdefs around it so that it will only work for defined
architectures.
(Note that the destination's "UCS-4-INTERNAL" rather than simply
"UCS-4", since "UCS-4" seems to be synonymous with "UCS-4BE"; this is
obviously incorrect on little-endian platforms.)
--
% Andre Pang : trust.in.love.to.save <http://www.algorithm.com.au/>
_______________________________________________
coders mailing list
[email protected]
http://lists.slug.org.au/listinfo/coders