RE: transcodeTo results in non-printable chars

Swatilekha Doloi Fri, 26 Mar 2010 03:05:30 -0700


>-----Original Message-----
>From: David Bertoni [mailto:[email protected]] 
>Sent: Friday, March 26, 2010 2:03 AM
>To: [email protected]
>Subject: Re: transcodeTo results in non-printable chars
>
>On 3/25/2010 12:03 AM, Swatilekha Doloi wrote:
>> Hi,
>>
>> Sorry for the delay in responding. My usage of the word 'non-printable' is 
>> probably incorrect. It displays something that looks like this:  æc¾Òw×s %# 
>> S1# ÔwÔi�...@õ OQ  S # õ”3
>>
>> Using XMLString::transcode before giving the buffer to the UTF-8 Transcoder 
>> helped. This is my code now:
>OK, this is a very bad idea if your data is not in the machine's local 
>code page. You need to provide more information about what the encoding 
>of the data in szCSTABuffer is.


I actually don't know - it comes from a different program running on a 
different computer. And yes, you're right, there's no guarantee that it would 
match my machine's locale settings.
I would have to create a UTF16 transcoder and use 'transcodeTo' to convert the 
buffer to UTF-16.
But I'm a bit confused with the various options:
fgUTF16BEncodingString                           'UTF-16 (BE)'
fgUTF16BEncodingString2                          'UTF-16BE'
fgUTF16EncodingString                            'UTF-16'
fgUTF16EncodingString2                           'UCS2'
fgUTF16EncodingString3                           'IBM1200'
fgUTF16EncodingString4                           'IBM-1200'
fgUTF16EncodingString5                           'UTF16'
fgUTF16EncodingString6                           'UCS-2'
fgUTF16EncodingString7                           'ISO-10646-UCS-2'
fgUTF16LEncodingString                           'UTF-16 (LE)'
fgUTF16LEncodingString2                          'UTF-16LE'

My target system is BE.
Should I use the ones for BE (fgUTF16BEncodingString/ fgUTF16BEncodingString2)?
Or would these be fine (fgUTF16EncodingString/ fgUTF16EncodingString5)? 

>
>> /*******************************************************************/
>> if(szCSTABuffer)
>> {
>>
>>      /** Transcode the CSTA Buffer into XMLCh* */
>>      xmlInput = XMLString::transcode(szCSTABuffer);
>>
>>      uiInLength      =       XMLString::stringLen(xmlInput);
>>      uiOutLength     =       uiInLength * UTF8_BYTES_PER_CHARACTER;
>>      //UTF8_BYTES_PER_CHARACTER is set to 4
>>
>>      /** Allocate memory for the output of the transcode operation*/
>>      xmlTranscodedOutput = new XMLByte[uiOutLength + 1];
>>
>>
>>      if(xmlTranscodedOutput)
>>      {
>>              /** Transcode */
>>              // m_pUTF8Transcoder is of type  XMLTranscoder*
>>              uiTotalChars = m_pUTF8Transcoder->transcodeTo(
>>                                                 (const XMLCh* const)xmlInput,
>This cast is not necessary.
>
Thanks I will remove it.
>>                                                  uiInLength,
>>                                                  xmlTranscodedOutput,
>>                                                  uiOutLength,
>>                                                  uiCharsTranscoded,
>>                                                  
>> XMLTranscoder::UnRep_RepChar);
>>
>>              xmlTranscodedOutput[uiTotalChars] = '\0';
>>              XMLString::release(&xmlInput);
>>      }
>> }
>>              
>> Variables are defined as follows:
>> char*                        szCSTABuffer            =       NULL;
>> XMLCh*               xmlInput                        =       NULL;
>> XMLByte*             xmlTranscodedOutput     =       NULL;
>> unsigned int uiInLength                      =       0;
>> unsigned int uiOutLength                     =       0;
>> unsigned int uiCharsTranscoded               =       0;
>> unsigned int uiTotalChars            =       0;
>> /*******************************************************************/
>>
>> I was wondering, is there a way to optimise this?
>Without more information, it's hard to say what you should be doing. 
>However, my guess is you're trying to transcode from a single byte, or 
>variable byte encoding to UTF-8.
>
>The proper way to do this in Xerces-C is by pivoting through UTF-16. 
>You're almost there, but you're transcoding to UTF-16 through 
>XMLString::transcode(), which is only correct if your szCSTABuffer is 
>encoded in the local code page. You may need to create an explicit 
>transcoder for the encoding in szCSTABuffer, use that transcoder to get 
>to UTF-116, then use a UTF8 transcoder to get from UTF-16 to UTF-8.
>
>Dave
I am not sure whether top-posting is ok or not. Apologies if it's unreadable.

RE: transcodeTo results in non-printable chars

Reply via email to