Hi,
I've googled around and found this question asked in quite a few
places, but not any answer to it.
What is the best way of handling strings, and particularly string
literals, in portable code?
Specifically, I'm interested in building code with VC++ 2013 on
Windows and G++ 4.8 on Linux. On Windows, the Xerces binary build
uses wchar_t as the character type, and so, naturally enough, people
on Windows write code that passes around wchar_t. Unfortunately, G++
has a 4-byte wchar_t and so Xerces uses unsigned short int (or
uint16_t) as its character type. This causes all such code written
for Windows to break in fairly horrible ways on Linux, and in ways
that require wide-ranging code changes to fix.
So far, the best solution I've come up with looks something like this:
#if defined _MSC_VER
#define U16S(x) L##x
typedef wchar_t my_u16_char_t;
typedef std::wstring my_u16_str_t;
typedef std::wstringstream my_u16_stream_t;
inline XmlCh* XmlString(my_u16_char_t* s) { return s; }
inline XmlCh* XmlString(my_u16_str_t* s) { return s.c_str(); }
#elif defined __linux
#define U16S(x) u##x
typedef char16_t my_u16_char_t;
typedef std::basic_string<char16_t> my_u16_str_t;
typedef std::basic_stringstream<char16_t> my_u16_stream_t;
inline XmlCh* XmlString(my_u16_char_t* s) { return
reinterpret_cast<char16_t*>(s); }
inline XmlCh* XmlString(my_u16_str_t* s) { return XmlString(s.c_str()); }
#endif
But of course this still requires major code changes for existing code
that uses wchar_t.
Is there a better way of sorting this out? C++11 now has a distinct,
UTF-16-encoded character type, char16_t. Is there any plan to make
Xerces use it?
Thanks,
Tom