From: "Ben Griffin" <[email protected]>
* About the only common platform we don't compile under is Microsoft - our
code base is for unix-flavours. :D.
* AFAIK UTF-* formats are not fixed length encodings and, AFAIK, the
wchar_t is always fixed length.
- (Are you sure that Microsoft wchar_t IS UTF-16 (LE) and not UCS-2
(LE)? I do not know - just curious).
It's definitely UTF-16. That said, how many of the Windows APIs actually
handle surrogate pairs (rather than leaving it to the programmer) is not a
question I'd like to comment on :-) I suspect we'll find out when fonts
containing symbols at code points requiring surrogate pairs start to become
commonplace.
* Under a default gcc compile on Linux, L defines a four byte character,
not a two byte one. It isn't UTF-32.
Not UTF-32? If you have 4-byte characters then I'd have thought UTF-32
would be the reason. Many people think Microsoft's choice of UTF-16 should
have been UTF-32 (because every character is the a single unit - no
surrogate pairs). [Don't know what will happen if we discover that humans
have invented more than 2^32 different symbols and we need a font with all
of them :-) ]
* XMLCh is not always defined as wchar_t - as you discovered. Eg. on Mac
OS X it's uint16_t by default. I need to allow for that.
I am starting to appreciate that.
* Yes, const wchar_t szSysName[] = L"System font"; is legal - AFAIK,
difficulties arise when you need a static cast over the declaration.
Regarding your suggestion of a class derivation for the STL template
instance std::basic_string<XMLCh> , at some point it may well be
worthwhile for us to define an internal string class for dealing with all
these issues, but currently, I am quite happy to continue to use
std::basic_string<XMLCh> or a typdef. My main issue is with the way of
declaring literals.
The preprocessor directive I am using at the moment is
#define UCS2(x) (const XMLCh*)(x)
So that I can declare literals as follows:
const XMLCh* myAnyString= UCS2(L"ANY"); //not perfect but better than
const XMLCh myAnyString[] = { chLatin_A, chLatin_N, chLatin_Y, chNull };
// "ANY"
I am happy enough with a static cast over L - especially as it seems the
two XMLCh options will work - it will either be redundant or it will be a
reliable cast.
That looks good. In fact you could include the 'L' in the definition of
the macro, and it would be very similar to Microsoft's _T("xyz") which
evaluates to L"xyz" or "xyz" according to a definition or otherwise in the
project. (I am not a big fan of everything Microsoft does, but this one
was immensely helpful in converting a very big old project to Unicode.)
I appreciate this discussion. I'm starting to feel more confident of mixing
XMLCh and wchar_t with Microsoft Visual Studio. I'm not proposing to
abandon the Microsoft compiler as I have too many shares in the MFC library
for that. But I'm starting to get a better overview of the portability
issues too.
Dave
David Webber
Mozart Music Software
http://www.mozart.co.uk
For discussion and support see
http://www.mozart.co.uk/mozartists/mailinglist.htm