From: "Ben Griffin" <[email protected]>

* About the only common platform we don't compile under is Microsoft - our code base is for unix-flavours. :D. * AFAIK UTF-* formats are not fixed length encodings and, AFAIK, the wchar_t is always fixed length. - (Are you sure that Microsoft wchar_t IS UTF-16 (LE) and not UCS-2 (LE)? I do not know - just curious).

It's definitely UTF-16. That said, how many of the Windows APIs actually handle surrogate pairs (rather than leaving it to the programmer) is not a question I'd like to comment on :-) I suspect we'll find out when fonts containing symbols at code points requiring surrogate pairs start to become commonplace.

* Under a default gcc compile on Linux, L defines a four byte character, not a two byte one. It isn't UTF-32.

Not UTF-32? If you have 4-byte characters then I'd have thought UTF-32 would be the reason. Many people think Microsoft's choice of UTF-16 should have been UTF-32 (because every character is the a single unit - no surrogate pairs). [Don't know what will happen if we discover that humans have invented more than 2^32 different symbols and we need a font with all of them :-) ]

* XMLCh is not always defined as wchar_t - as you discovered. Eg. on Mac OS X it's uint16_t by default. I need to allow for that.

I am starting to appreciate that.

* Yes, const wchar_t szSysName[] = L"System font"; is legal - AFAIK, difficulties arise when you need a static cast over the declaration.

Regarding your suggestion of a class derivation for the STL template instance std::basic_string<XMLCh> , at some point it may well be worthwhile for us to define an internal string class for dealing with all these issues, but currently, I am quite happy to continue to use std::basic_string<XMLCh> or a typdef. My main issue is with the way of declaring literals.

The preprocessor directive I am using at the moment is

#define UCS2(x) (const XMLCh*)(x)

So that I can declare literals as follows:

const XMLCh* myAnyString= UCS2(L"ANY"); //not perfect but better than const XMLCh myAnyString[] = { chLatin_A, chLatin_N, chLatin_Y, chNull }; // "ANY" I am happy enough with a static cast over L - especially as it seems the two XMLCh options will work - it will either be redundant or it will be a reliable cast.

That looks good. In fact you could include the 'L' in the definition of the macro, and it would be very similar to Microsoft's _T("xyz") which evaluates to L"xyz" or "xyz" according to a definition or otherwise in the project. (I am not a big fan of everything Microsoft does, but this one was immensely helpful in converting a very big old project to Unicode.)

I appreciate this discussion. I'm starting to feel more confident of mixing XMLCh and wchar_t with Microsoft Visual Studio. I'm not proposing to abandon the Microsoft compiler as I have too many shares in the MFC library for that. But I'm starting to get a better overview of the portability issues too.

Dave
David Webber
Mozart Music Software
http://www.mozart.co.uk
For discussion and support see
http://www.mozart.co.uk/mozartists/mailinglist.htm


Reply via email to