On 1/21/2010 4:18 AM, Ben Griffin wrote:
Hi Dave(s),
First of all, regarding the point about changing the configuration of the
xercesc headers; unfortunately we do not control that part of the environment.
But you'll need to rebuild Xerces-C with the -fshort-wchar, so why can't
you have the header file changed at the same time?
For those with the L operator, then
const XMLCh XMLUni::fgAnyString[] = { L'A', L'N', L'Y', L'\0' }
const XMLCh XMLUni::fgAnyString[] = L"ANY";
As I understand it, the two things above may not generate the same results, as
the width of L is sometimes more than two bytes, hence the need (in my OP) of
the compile time flag -fshort-wchar
I never suggested you use the "L" prefix, as it's not very portable.
Neither of these snippets will work unless XMLCh is a typedef for
wchar_t, and that is the case only on Windows.
Likewise on GCC,
std::basic_string<XMLCh> my_string = L"the string that I wish to declare";
(ie, without a static cast ) will generate an error message: invalid
conversion from 'const wchar_t*' to 'const short unsigned int*'
Yes, because wchar_t is a distinct type. That's why I suggested you
change the definition of XMLCh to wchar_t if you plan to use the
-fshort-wchar switch.
And the example above
const XMLCh XMLUni::fgAnyString[] = L"ANY";
generates the error "array must be initialized with a brace-enclosed
initializer" - which is understandable.
Again, unless you plan to make XMLCh a typedef for wchar_t, you can stop
using the "L" prefix and save us all a great deal of time.
This will work:
const wchar_t foo[] = { L"ANY" };
However:
const XMLCh foo[] = { L"ANY" };
won't work because wchar_t and XMLCh are distinct types.
Not usign a basic_string construct still generates the same invalid conversion
error.
const XMLCh* XMLUni::fgAnyString = L"ANY";
Produces the same effect (invalid conversion)
This is why I need to use a static cast as follows:
std::basic_string<XMLCh> my_string = (const XMLCh*)(L"the string that I wish to
declare");
Using preprocessor macros (yechh) I can tidy that up somewhat of course.
That's not a static cast, it's a C-style cast, which is effectively a
reinterpret_cast in C++. Again, that's why I suggested you change the
typedef for XMLCh.
Dave (Bertoni), your question regarding if short-wchar guarantees UTF-16 code
points is a good one; albeit that we are using the short-wchar flag.
I was not aware that XercescC XMLCh implementation was UTF-16; I guess I
erroneously thought that it was UCS-2.
(The UCS-2 encoding form is identical to that of UTF-16, except that it does
not support surrogate pairs and therefore can only encode characters in the BMP
range U+0000 through U+FFFF. As a consequence it is a fixed-length encoding
that always encodes characters into a single 16-bit value.)
Yes, I'm well aware of the differences between UTF-16 and UCS-2.
Xerces-C has always supported UTF-16, since the XML recommendation
requires support for characters outside of the BMP.
My string declarations only use characters that are in the UCS-2 / BMP range,
so I am not so concerned about the need to encode surrogate pairs as constants.
Regardless, the proposal of using the method in src/xercesc/util/XMLUni.cpp
does not support non BMP characters.
You can always initialize such surrogate pairs with integer constants.
There just aren't any mnemonics for them, since they're not needed in
the parser.
More to the point of your question though; regarding the GCC C++ flag
-fshort-wchar
http://gcc.gnu.org/onlinedocs/gcc-3.4.0/gcc/Code-Gen-Options.html#Code%20Gen%20Options
tells us this flag "overrides the underlying type for wchar_t to be short unsigned
int instead of the default for the target. This option is useful for building programs to
run under WINE."
Yes, I read that. However, it's important to understand that, even
though wchar_t will be the same size as short unsigned int, it will be a
distinct type, so this comment is misleading. This is a difference
between C and C++, because C considers wchar_t to be a typedef.
What is salient to us is that IIRC (by default) XMLCh is defined to be a short
unsigned int also.
Yes, short unsigned int, which is not the same type as wchar_t.
Therefore XMLCh == short unsigned int == wchar_t (when the -fshort-wchar flag
is used in GCC).
If this is the case then, as I understand it, using the static cast (const
XMLCh*)(L"the string that I wish to declare") should be perfectly fine.
I think you meant to say "XMLCh == short unsigned int" and
"sizeof(XMLCh) == sizeof(wchar_t)" when the -fshort-wchar flag is used.
Dave