Bits is bits; storing UTF-16 into a 4-byte wchar_t string can work.  But
the unwary might make the mistake of passing a wchar_t string containing
UTF-16 with surrogate pairs to a runtime library function that expects
UTF-32.  Seems to me that all bets are off at that point.

The central point, I think, is that you have to understand the different
types and what gets stored in them to use them safely with various
libraries.  You do, and can make robust decisions based on your
understanding, but many people don't understand that functions that
process wchar_t strings expect different sequences of bytes on different
platforms.

-----Original Message-----
From: Boris Kolpackov [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 19, 2007 3:12 PM
To: [email protected]
Subject: Re: WCHAR to XmlCh

Jesse Pelton <[EMAIL PROTECTED]> writes:

> That assumes wchar_t holds UTF-16 (as XMLCh does).  It might not.  See
> http://www.losingfight.com/blog/2006/07/28/wchar_t-unsafe-at-any-size/
> for a wchar_t story that would be amusing if it were fiction.

What most people fail to realize is that wchar_t holds whatever you
put into it. If you want portable UTF-16 in wchar_t then put UTF-16
into it, even on platforms where wchar_t is 4-bytes long and can
hold UTF-32.

Alternatively, it is possible to use UTF-16 on platforms where
wchar_t is 2-bytes long and UTF-32 on the rest. The only parts
that will need to know about this arrangement are those that
are responsible with converting to/from wchar_t strings (e.g.,
XMLCh to/from wchar_t). If the application does not need to do
anything special with (e.g., search for) characters that are
outside the BMP (Basic Multilingual Plane), then it can use
wchar_t that contains either UTF-16 or UTF-32 without actually
caring which one it is. And I am pretty sure this is 99.9% of
applications. We use this approach in our XML data binding tool
when the user requests the underlying character type to be wchar_t.


Boris


-- 
Boris Kolpackov
Code Synthesis Tools CC
http://www.codesynthesis.com
Open-Source, Cross-Platform C++ XML Data Binding

Reply via email to