[coders] [solved] Re: Converting a UTF-8 string to a wchar_t (in C)

Andre Pang Wed, 13 Dec 2006 21:52:11 -0800

On 14/12/2006, at 1:15 AM, Andre Pang wrote:

I have a C string (char*) that's encoded in UTF-8. I'd like toconvert this to a wide string (wchar_t*). I've done plenty ofreading about mbstowcs(3), iconv(3) and friends, and from what Iunderstand, I have two options:
...
So far, I've tried (2) -- the iconv() method -- and it doesn't workfor me. It seems to work fine if the characters are ASCII, but themoment it actually hits any non-ASCII characters, iconv() throws areturn code of -1 and errno's set to EILSEQ. I'm assuming thereare some bugs in my code, which is no surprise considering howannoying iconv() is to use.

Well, this was slightly bizarre. I changed the destination encodingfrom:

const iconv_t utf8ToWCharTIconvDescriptor = iconv_open("WCHAR_T","UTF-8");

to

const iconv_t utf8ToWCharTIconvDescriptor = iconv_open("UCS-4-INTERNAL", "UTF-8");

And then iconv() did its job merrily. (I even put in a setlocale(LC_CTYPE, "") before trying to use "WCHAR_T".)

I realise this means that I'm relying on wchar_t being UCS-4, butI've got #ifdefs around it so that it will only work for definedarchitectures.

(Note that the destination's "UCS-4-INTERNAL" rather than simply"UCS-4", since "UCS-4" seems to be synonymous with "UCS-4BE"; this isobviously incorrect on little-endian platforms.)



--
% Andre Pang : trust.in.love.to.save  <http://www.algorithm.com.au/>



_______________________________________________
coders mailing list
[email protected]
http://lists.slug.org.au/listinfo/coders

[coders] [solved] Re: Converting a UTF-8 string to a wchar_t (in C)

Reply via email to