On 1/20/2010 5:45 AM, Ben Griffin wrote:
We use many literal XMLCh* string declarations in our codebase.
I am still not sure what is the safest, but most efficient way of declaring
these WITHOUT RELYING UPON A TRANSCODE.
Take a look at src/xercesc/util/XMLUni.cpp:
const XMLCh XMLUni::fgAnyString[] =
{
chLatin_A, chLatin_N, chLatin_Y, chNull
};
You could make this more readable by adding a literal string as a comment:
// "ANY"
const XMLCh XMLUni::fgAnyString[] =
{
chLatin_A, chLatin_N, chLatin_Y, chNull
};
With the compilation setting of -fshort-wchar (we are only interested in gcc)
are there any problems or caveats with using:
std::basic_string<XMLCh> my_string = (const XMLCh*)(L"the string that I wish to
declare");
I'm not sure if short-wchar guarantees UTF-16 code points, although it
seems to be implied, since the manual mentions it as useful for creating
programs that run under WINE. If you do this, I would also recommend you
modify src/xerces/util/Xerces_autoconf_config.hpp to reflect that
wchar_t should be the type for XMLCh:
#define XERCES_XMLCH_T wchar_t
You should make this change after you configure Xerces-C, but before you
build it. If you do this, you won't need to cast between wchar_t and XMLCh.
If you use this option, then any other object code you link with must be
built with it, and anyone who uses your code will need it as well.
Are there neater ways of doing the same?
I know of the alternative of using the chXX characters, etc:
std::basic_string<XMLCh> my_string = {'t','h','e',' ','s','t','r','i','n', ...
,chNull};
I don't really find this acceptable - the code is more or less unreadable. It
seems crazy to have to use a transcoder because there isn't a tidy way to
define a string literal.
The C++ standard doesn't define a portable way to indicate UTF-16 string
constants, so it's not surprising it's a problem. This should change in
the next version of the standard, but it will be a long time before
compilers that support it are widely available.
Dave