[coders] Converting a UTF-8 string to a wchar_t (in C)

Andre Pang Wed, 13 Dec 2006 06:15:28 -0800

Herro all,

I have a C string (char*) that's encoded in UTF-8. I'd like toconvert this to a wide string (wchar_t*). I've done plenty ofreading about mbstowcs(3), iconv(3) and friends, and from what Iunderstand, I have two options:

1. First, setlocale() to some bogus UTF-8 locale (such as"en_US.UTF-8", and then use mbstowcs() to perform the conversion.

2. Use the stupendously painful iconv() interface with a iconv_tfrom "UTF-8" to "WCHAR_T".

So far, I've tried (2) -- the iconv() method -- and it doesn't workfor me. It seems to work fine if the characters are ASCII, but themoment it actually hits any non-ASCII characters, iconv() throws areturn code of -1 and errno's set to EILSEQ. I'm assuming there aresome bugs in my code, which is no surprise considering how annoyingiconv() is to use.

So instead of actually trying to fix the bugs, I figure that usingmbstowcs() is probably easier than trying to work around iconv()'sbrain damage. The thing is, surely there _must_ be some way to tellmbstowcs() that the source string to convert is in UTF-8, besidesusing setlocale() with a dummy UTF-8 locale. I'm only concernedabout the encoding type after all, not what language it's in, and Ifeel quite yucky doing something like setlocale(LC_CTYPE,"en_US.UTF-8"), because I'm not in the USA.

Is there something I'm missing, or is the way that everybody reallydoes it? I'm thinking that converting between UTF-8 and wchar_t mustbe somewhat common these days, but Googling for "convert utf-8 towchar_t" really isn't being all that helpful.

(I'm also quite happy to use C++'s locale/facet/codecvt stuff, butthe documentation I've found about that so far appears to be equallyterse.)


Cheers,


--
% Andre Pang : trust.in.love.to.save  <http://www.algorithm.com.au/>



_______________________________________________
coders mailing list
[email protected]
http://lists.slug.org.au/listinfo/coders

[coders] Converting a UTF-8 string to a wchar_t (in C)

Reply via email to