Hi,
probably this has been asked already but I did not find a reference.
According to the Xerces documentation the internal representation of strings is
UTF-16. When I tried Xerces on OS X where XMLCh is defined as a uint_16 and try
to convert from the current locale („de-DE“) to an XMLCh string representation
by:
char const* inputSource = "\U0001F600";
XMLCh* outputCharacter = XMLString::transcode(inputSource);
This obviously does not work. The output is one character with the value 62976.
Looking at the XMLString::transcode method (based on
IconvLCPTranscoder::transcode)it is obvious that it does not work because on OS
X:
- the size of wchar_t is 32 bit;
- XMLString::transcode copies the output of mbsrtowcs (wide characters of 32
bit into a 16 bit XMLCh buffer).
while(true)
{
size_t len = ::mbsrtowcs(tmpString + dstCursor, &src, resultSize -
dstCursor, &st); // len is based on 32 bit
if (len == TRANSCODING_ERROR)
{
dstCursor = 0;
break;
}
dstCursor += len;
if (src == 0) // conversion finished
break;
if (dstCursor >= resultSize - 1)
reallocString<wchar_t>(tmpString, resultSize, manager, tmpString !=
localBuffer);
}
// make a final copy, converting from wchar_t to XMLCh:
XMLCh* resultString = (XMLCh*)manager->allocate((dstCursor + 1) *
sizeof(XMLCh)); // result string is based on 16 bit
size_t i;
for (i=0; i<dstCursor; ++i)
resultString[i] = tmpString[i]; // 32 bit number is stored in 16 bit
number
Therefore, I have two questions:
1) is „configure“ wrongly configured to use for XMLCh a uint_16?
2) how will the result be converted in any case to a UTF-16 encoding?
Best regards,
Hartwig