From: "Ben Griffin" <[email protected]>
For those with the L operator, then
const XMLCh XMLUni::fgAnyString[] = { L'A', L'N', L'Y', L'\0' }
const XMLCh XMLUni::fgAnyString[] = L"ANY";
As I understand it, the two things above may not generate the same
results, as the width of L is sometimes more than two bytes, hence the
need (in my OP) of the compile time flag -fshort-wchar
You mean L might define UTF-32?? Ok. (I'm so ensconced in Microsoft's
UTF-16 (LE) that I hadn't thought of that.)
Just some observations in case they might be helpful:
Likewise on GCC,
std::basic_string<XMLCh> my_string = L"the string that I wish to declare";
(ie, without a static cast ) will generate an error message: invalid
conversion from 'const wchar_t*' to 'const short unsigned int*'
I now see that XMLCh is actually defined as wchar_t when used with Visual
Studio 2008, so maybe I'm at an advantage here. Does GCC not have wchar_t?
And the example above
const XMLCh XMLUni::fgAnyString[] = L"ANY";
generates the error "array must be initialized with a brace-enclosed
initializer" - which is understandable.
Well maybe again if XMLCh is defined as a 2-byte integer, but character
array initialisation in the form
const wchar_t szSysName[] = L"System font";
has, I believe, always been legal with wchar_t in the C++ spec (and is
explicitly allowed with char in my 1991 copy of Stroustrup).
Not usign a basic_string construct still generates the same invalid
conversion error.
const XMLCh* XMLUni::fgAnyString = L"ANY";
Produces the same effect (invalid conversion)
This is why I need to use a static cast as follows:
std::basic_string<XMLCh> my_string = (const XMLCh*)(L"the string that I
wish to declare");
Using preprocessor macros (yechh) I can tidy that up somewhat of course.
It would be neater to derive a class from
std::basic_string<XMLCh>
and give it appropriate constructors and assignment operators.
Dave (Bertoni), your question regarding if short-wchar guarantees UTF-16
code points is a good one; albeit that we are using the short-wchar flag.
I was not aware that XercescC XMLCh implementation was UTF-16; I guess I
erroneously thought that it was UCS-2.
(The UCS-2 encoding form is identical to that of UTF-16, except that it
does not support surrogate pairs and therefore can only encode characters
in the BMP range U+0000 through U+FFFF. As a consequence it is a
fixed-length encoding that always encodes characters into a single 16-bit
value.)
[The only case where I have found that I personally would have to worry
about the difference is in the collection Unicode music symbols. But as
fonts don't usually have them, even that is a bit academic.]
My string declarations only use characters that are in the UCS-2 / BMP
range, so I am not so concerned about the need to encode surrogate pairs
as constants. Regardless, the proposal of using the method in
src/xercesc/util/XMLUni.cpp does not support non BMP characters.
That's what I found curious.
More to the point of your question though; regarding the GCC C++
flag -fshort-wchar
http://gcc.gnu.org/onlinedocs/gcc-3.4.0/gcc/Code-Gen-Options.html#Code%20Gen%20Options
tells us this flag "overrides the underlying type for wchar_t to be short
unsigned int instead of the default for the target. This option is useful
for building programs to run under WINE."
My software runs well under wine, just using Microsoft's in-built wchar_t.
I don't know if that is a useful observation though.
What is salient to us is that IIRC (by default) XMLCh is defined to be a
short unsigned int also.
Therefore XMLCh == short unsigned int == wchar_t (when the -fshort-wchar
flag is used in GCC).
If this is the case then, as I understand it, using the static cast (const
XMLCh*)(L"the string that I wish to declare") should be perfectly fine.
In my version I have
typedef XERCES_XMLCH_T XMLCh;
#ifdef _NATIVE_WCHAR_T_DEFINED
#define XERCES_XMLCH_T wchar_t
#else
#define XERCES_XMLCH_T unsigned short
#endif
and somewhere
_NATIVE_WCHAR_T_DEFINED
is indeed defined.
Dave
David Webber
Mozart Music Software
http://www.mozart.co.uk
For discussion and support see
http://www.mozart.co.uk/mozartists/mailinglist.htm