>-----Original Message-----
>From: David Bertoni [mailto:[email protected]]
>Sent: Friday, March 26, 2010 2:03 AM
>To: [email protected]
>Subject: Re: transcodeTo results in non-printable chars
>
>On 3/25/2010 12:03 AM, Swatilekha Doloi wrote:
>> Hi,
>>
>> Sorry for the delay in responding. My usage of the word 'non-printable' is
>> probably incorrect. It displays something that looks like this: æc¾Òw×s %#
>> S1# ÔwÔi�...@õ OQ S # õ”3
>>
>> Using XMLString::transcode before giving the buffer to the UTF-8 Transcoder
>> helped. This is my code now:
>OK, this is a very bad idea if your data is not in the machine's local
>code page. You need to provide more information about what the encoding
>of the data in szCSTABuffer is.
I actually don't know - it comes from a different program running on a
different computer. And yes, you're right, there's no guarantee that it would
match my machine's locale settings.
I would have to create a UTF16 transcoder and use 'transcodeTo' to convert the
buffer to UTF-16.
But I'm a bit confused with the various options:
fgUTF16BEncodingString 'UTF-16 (BE)'
fgUTF16BEncodingString2 'UTF-16BE'
fgUTF16EncodingString 'UTF-16'
fgUTF16EncodingString2 'UCS2'
fgUTF16EncodingString3 'IBM1200'
fgUTF16EncodingString4 'IBM-1200'
fgUTF16EncodingString5 'UTF16'
fgUTF16EncodingString6 'UCS-2'
fgUTF16EncodingString7 'ISO-10646-UCS-2'
fgUTF16LEncodingString 'UTF-16 (LE)'
fgUTF16LEncodingString2 'UTF-16LE'
My target system is BE.
Should I use the ones for BE (fgUTF16BEncodingString/ fgUTF16BEncodingString2)?
Or would these be fine (fgUTF16EncodingString/ fgUTF16EncodingString5)?
>
>> /*******************************************************************/
>> if(szCSTABuffer)
>> {
>>
>> /** Transcode the CSTA Buffer into XMLCh* */
>> xmlInput = XMLString::transcode(szCSTABuffer);
>>
>> uiInLength = XMLString::stringLen(xmlInput);
>> uiOutLength = uiInLength * UTF8_BYTES_PER_CHARACTER;
>> //UTF8_BYTES_PER_CHARACTER is set to 4
>>
>> /** Allocate memory for the output of the transcode operation*/
>> xmlTranscodedOutput = new XMLByte[uiOutLength + 1];
>>
>>
>> if(xmlTranscodedOutput)
>> {
>> /** Transcode */
>> // m_pUTF8Transcoder is of type XMLTranscoder*
>> uiTotalChars = m_pUTF8Transcoder->transcodeTo(
>> (const XMLCh* const)xmlInput,
>This cast is not necessary.
>
Thanks I will remove it.
>> uiInLength,
>> xmlTranscodedOutput,
>> uiOutLength,
>> uiCharsTranscoded,
>>
>> XMLTranscoder::UnRep_RepChar);
>>
>> xmlTranscodedOutput[uiTotalChars] = '\0';
>> XMLString::release(&xmlInput);
>> }
>> }
>>
>> Variables are defined as follows:
>> char* szCSTABuffer = NULL;
>> XMLCh* xmlInput = NULL;
>> XMLByte* xmlTranscodedOutput = NULL;
>> unsigned int uiInLength = 0;
>> unsigned int uiOutLength = 0;
>> unsigned int uiCharsTranscoded = 0;
>> unsigned int uiTotalChars = 0;
>> /*******************************************************************/
>>
>> I was wondering, is there a way to optimise this?
>Without more information, it's hard to say what you should be doing.
>However, my guess is you're trying to transcode from a single byte, or
>variable byte encoding to UTF-8.
>
>The proper way to do this in Xerces-C is by pivoting through UTF-16.
>You're almost there, but you're transcoding to UTF-16 through
>XMLString::transcode(), which is only correct if your szCSTABuffer is
>encoded in the local code page. You may need to create an explicit
>transcoder for the encoding in szCSTABuffer, use that transcoder to get
>to UTF-116, then use a UTF8 transcoder to get from UTF-16 to UTF-8.
>
>Dave
I am not sure whether top-posting is ok or not. Apologies if it's unreadable.