Re: Shorthand safe way of declaring literal const XMLCh* for GCC compiler?

David Webber Thu, 21 Jan 2010 11:44:00 -0800

From: "Ben Griffin" <[email protected]>

* About the only common platform we don't compile under is Microsoft - ourcode base is for unix-flavours. :D.* AFAIK UTF-* formats are not fixed length encodings and, AFAIK, thewchar_t is always fixed length.- (Are you sure that Microsoft wchar_t IS UTF-16 (LE) and not UCS-2(LE)? I do not know - just curious).

It's definitely UTF-16. That said, how many of the Windows APIs actuallyhandle surrogate pairs (rather than leaving it to the programmer) is not aquestion I'd like to comment on :-) I suspect we'll find out when fontscontaining symbols at code points requiring surrogate pairs start to becomecommonplace.

* Under a default gcc compile on Linux, L defines a four byte character,not a two byte one. It isn't UTF-32.

Not UTF-32? If you have 4-byte characters then I'd have thought UTF-32would be the reason. Many people think Microsoft's choice of UTF-16 shouldhave been UTF-32 (because every character is the a single unit - nosurrogate pairs). [Don't know what will happen if we discover that humanshave invented more than 2^32 different symbols and we need a font with allof them :-) ]

* XMLCh is not always defined as wchar_t - as you discovered. Eg. on MacOS X it's uint16_t by default. I need to allow for that.


I am starting to appreciate that.

* Yes, const wchar_t szSysName[] = L"System font"; is legal - AFAIK,difficulties arise when you need a static cast over the declaration.

Regarding your suggestion of a class derivation for the STL templateinstance std::basic_string<XMLCh> , at some point it may well beworthwhile for us to define an internal string class for dealing with allthese issues, but currently, I am quite happy to continue to usestd::basic_string<XMLCh> or a typdef. My main issue is with the way ofdeclaring literals.

The preprocessor directive I am using at the moment is

#define UCS2(x) (const XMLCh*)(x)

So that I can declare literals as follows:
const XMLCh* myAnyString= UCS2(L"ANY"); //not perfect but better thanconst XMLCh myAnyString[] = { chLatin_A, chLatin_N, chLatin_Y, chNull };// "ANY"I am happy enough with a static cast over L - especially as it seems thetwo XMLCh options will work - it will either be redundant or it will be areliable cast.

That looks good. In fact you could include the 'L' in the definition ofthe macro, and it would be very similar to Microsoft's _T("xyz") whichevaluates to L"xyz" or "xyz" according to a definition or otherwise in theproject. (I am not a big fan of everything Microsoft does, but this onewas immensely helpful in converting a very big old project to Unicode.)

I appreciate this discussion. I'm starting to feel more confident of mixingXMLCh and wchar_t with Microsoft Visual Studio. I'm not proposing toabandon the Microsoft compiler as I have too many shares in the MFC libraryfor that. But I'm starting to get a better overview of the portabilityissues too.


Dave
David Webber
Mozart Music Software
http://www.mozart.co.uk
For discussion and support see
http://www.mozart.co.uk/mozartists/mailinglist.htm

Re: Shorthand safe way of declaring literal const XMLCh* for GCC compiler?

Reply via email to