-----Original Message-----
From: Swatilekha Doloi 
Sent: Friday, March 26, 2010 5:12 PM
To: '[email protected]'
Subject: RE: transcodeTo results in non-printable chars


-----Original Message-----
From: Swatilekha Doloi [mailto:[email protected]] 
Sent: Friday, March 26, 2010 3:35 PM
To: [email protected]
Subject: RE: transcodeTo results in non-printable chars


>-----Original Message-----
>From: David Bertoni [mailto:[email protected]] 
>Sent: Friday, March 26, 2010 2:03 AM
>To: [email protected]
>Subject: Re: transcodeTo results in non-printable chars
>
>On 3/25/2010 12:03 AM, Swatilekha Doloi wrote:
>> Hi,
>>
>> Sorry for the delay in responding. My usage of the word 'non-printable' is 
>> probably incorrect. It displays something that looks like this:  æc¾Òw×s %# 
>> S1# ÔwÔi�...@õ OQ  S # õ”3
>>
>> Using XMLString::transcode before giving the buffer to the UTF-8 Transcoder 
>> helped. This is my code now:
>OK, this is a very bad idea if your data is not in the machine's local 
>code page. You need to provide more information about what the encoding 
>of the data in szCSTABuffer is.

I actually don't know - it comes from a different program running on a 
different computer. And yes, you're right, there's no guarantee that it would 
match my machine's locale settings.
I would have to create a UTF16 transcoder and use 'transcodeTo' to convert the 
buffer to UTF-16.
But I'm a bit confused with the various options:
fgUTF16BEncodingString                           'UTF-16 (BE)'
fgUTF16BEncodingString2                          'UTF-16BE'
fgUTF16EncodingString                            'UTF-16'
fgUTF16EncodingString2                           'UCS2'
fgUTF16EncodingString3                           'IBM1200'
fgUTF16EncodingString4                           'IBM-1200'
fgUTF16EncodingString5                           'UTF16'
fgUTF16EncodingString6                           'UCS-2'
fgUTF16EncodingString7                           'ISO-10646-UCS-2'
fgUTF16LEncodingString                           'UTF-16 (LE)'
fgUTF16LEncodingString2                          'UTF-16LE'

My target system is BE.
Should I use the ones for BE (fgUTF16BEncodingString/ fgUTF16BEncodingString2)?
Or would these be fine (fgUTF16EncodingString/ fgUTF16EncodingString5)? 

Addendum: Also, I would like to know how to do this cascading transcode?
  some-encoding-->UTF-16BE--->UTF-8
After I transcode the buffer to UTF-16, the output is of type XMLByte.
The Transcoder for UTF-8 expects XMLCh* and not XMLByte* as the input.

One last addition: transcodeTo for UTF-16 crashes sometimes. I don't know why 
this is happening. The call stack shows somewhere inside 
xercesc_2_8::XMLUTF16Transcoder::transcodeTo() a memcpy is crashing. This does 
not happen every time, though.

/** Transcode to UTF-8 */
 uiInLength             =       strlen(szCSTABuffer); 
 uiOutLength    =       uiInLength * UTF16_BYTES_PER_CHARACTER;

/** Allocate memory for the output of the transcode operation*/
xmlInput = new XMLByte[uiOutLength + 1]; 

if(xmlInput)
{
        /** Transcode */
        uiTotalChars = m_pUTF16Transcoder->transcodeTo((const XMLCh*            
                                                                
const)szCSTABuffer,
                                                                        
uiInLength,
                                                                        
xmlInput,
                                                                        
uiOutLength,
                                                                        
uiCharsTranscoded,
                                                          
XMLTranscoder::UnRep_RepChar);

        xmlInput[uiTotalChars] = '\0'; 
}
What am I doing wrong? Is it the typecast to XMLCh* from char* when calling 
transcodeTo? Any other way to convert char* to XMLCh*? Please help!
>
>> /*******************************************************************/
>> if(szCSTABuffer)
>> {
>>
>>      /** Transcode the CSTA Buffer into XMLCh* */
>>      xmlInput = XMLString::transcode(szCSTABuffer);
>>
>>      uiInLength      =       XMLString::stringLen(xmlInput);
>>      uiOutLength     =       uiInLength * UTF8_BYTES_PER_CHARACTER;
>>      //UTF8_BYTES_PER_CHARACTER is set to 4
>>
>>      /** Allocate memory for the output of the transcode operation*/
>>      xmlTranscodedOutput = new XMLByte[uiOutLength + 1];
>>
>>
>>      if(xmlTranscodedOutput)
>>      {
>>              /** Transcode */
>>              // m_pUTF8Transcoder is of type  XMLTranscoder*
>>              uiTotalChars = m_pUTF8Transcoder->transcodeTo(
>>                                                 (const XMLCh* const)xmlInput,
>This cast is not necessary.
>
Thanks I will remove it.
>>                                                  uiInLength,
>>                                                  xmlTranscodedOutput,
>>                                                  uiOutLength,
>>                                                  uiCharsTranscoded,
>>                                                  
>> XMLTranscoder::UnRep_RepChar);
>>
>>              xmlTranscodedOutput[uiTotalChars] = '\0';
>>              XMLString::release(&xmlInput);
>>      }
>> }
>>              
>> Variables are defined as follows:
>> char*                szCSTABuffer            =       NULL;
>> XMLCh*               xmlInput                        =       NULL;
>> XMLByte*             xmlTranscodedOutput     =       NULL;
>> unsigned int uiInLength                      =       0;
>> unsigned int uiOutLength                     =       0;
>> unsigned int uiCharsTranscoded               =       0;
>> unsigned int uiTotalChars            =       0;
>> /*******************************************************************/
>>
>> I was wondering, is there a way to optimise this?
>Without more information, it's hard to say what you should be doing. 
>However, my guess is you're trying to transcode from a single byte, or 
>variable byte encoding to UTF-8.
>
>The proper way to do this in Xerces-C is by pivoting through UTF-16. 
>You're almost there, but you're transcoding to UTF-16 through 
>XMLString::transcode(), which is only correct if your szCSTABuffer is 
>encoded in the local code page. You may need to create an explicit 
>transcoder for the encoding in szCSTABuffer, use that transcoder to get 
>to UTF-116, then use a UTF8 transcoder to get from UTF-16 to UTF-8.
>
>Dave
I am not sure whether top-posting is ok or not. Apologies if it's unreadable. 

Reply via email to