Re: How do I use Xerces strings?

Steven T. Hatton Wed, 08 Mar 2006 19:27:38 -0800

On Wednesday 08 March 2006 02:18, Scott Cantor wrote:

> > IIRC, there /are/ different UTF encodings, even within UTF-16.
> > There is something called UCS-4, and also something called UCS-2 (I
> > believe). I do not know the difference between these and their related
> > UTF-32 and UTF-16.
>
> Nor I, but that's what I had in mind when I expressed caution.


To my mind, the failure to specify a UTF-16 string class is one of the worst 
aspects of C++.  After reading the applicable sections of ISO/IEC 14882:2003, 
I have come to the conclusion that the Xerces XMLCh is not defined in such a 
way as to conform to the definition of a C++ implementation's extended 
character set.  In oder to implement the C++ extended character set, members 
of the C++ basic character set (ASCII character set) should be defined as 
wchar_t using their wide character literals.  That is, for example:

typedef wchar_t XMLCh;

const XMLCh chLatin_A               = L'A';
const XMLCh chLatin_B               = L'B';
const XMLCh chLatin_C               = L'C';
const XMLCh chLatin_D               = L'D';

Rather than:

typedef unsigned short XMLCh;

const XMLCh chLatin_A               = 0x41;
const XMLCh chLatin_B               = 0x42;
const XMLCh chLatin_C               = 0x43;
const XMLCh chLatin_D               = 0x44;

There may be reasons the Xerces developers chose to implement UTF-16 without 
conforming to the requirements for implementing the C++ extended character 
set.  I guess, technically speaking, the encoding of UTF-16 and the extended 
character set will not, in general, coincide.  That is, there is no 
requirement that the ASCII character set be encoded using ASCII values.  In 
such a case, then the numerical value of chLatin_A would not be the same in 
all implementations.  

Nonetheless (IMO), properly written code should not rely on such 
implementation details.

Steven

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: How do I use Xerces strings?

Reply via email to