On 3/25/2010 12:03 AM, Swatilekha Doloi wrote:
Hi,

Sorry for the delay in responding. My usage of the word 'non-printable' is 
probably incorrect. It displays something that looks like this: 
æc¾Òw×s%#s1#ÔwÔi�...@õoqs#õ”3

Using XMLString::transcode before giving the buffer to the UTF-8 Transcoder 
helped. This is my code now:
OK, this is a very bad idea if your data is not in the machine's local code page. You need to provide more information about what the encoding of the data in szCSTABuffer is.

/*******************************************************************/
if(szCSTABuffer)
{

        /** Transcode the CSTA Buffer into XMLCh* */
        xmlInput = XMLString::transcode(szCSTABuffer);

        uiInLength      =       XMLString::stringLen(xmlInput);
        uiOutLength     =       uiInLength * UTF8_BYTES_PER_CHARACTER;
        //UTF8_BYTES_PER_CHARACTER is set to 4

        /** Allocate memory for the output of the transcode operation*/
        xmlTranscodedOutput = new XMLByte[uiOutLength + 1];


        if(xmlTranscodedOutput)
        {
                /** Transcode */
                // m_pUTF8Transcoder is of type  XMLTranscoder*
                uiTotalChars = m_pUTF8Transcoder->transcodeTo(
                                                   (const XMLCh* const)xmlInput,
This cast is not necessary.

                                                    uiInLength,
                                                    xmlTranscodedOutput,
                                                    uiOutLength,
                                                    uiCharsTranscoded,
                                                    
XMLTranscoder::UnRep_RepChar);

                xmlTranscodedOutput[uiTotalChars] = '\0';
                XMLString::release(&xmlInput);
        }
}
                
Variables are defined as follows:
char*                   szCSTABuffer            =       NULL;
XMLCh*          xmlInput                        =       NULL;
XMLByte*                xmlTranscodedOutput     =       NULL;
unsigned int    uiInLength                      =       0;
unsigned int    uiOutLength                     =       0;
unsigned int    uiCharsTranscoded               =       0;
unsigned int    uiTotalChars            =       0;
/*******************************************************************/

I was wondering, is there a way to optimise this?
Without more information, it's hard to say what you should be doing. However, my guess is you're trying to transcode from a single byte, or variable byte encoding to UTF-8.

The proper way to do this in Xerces-C is by pivoting through UTF-16. You're almost there, but you're transcoding to UTF-16 through XMLString::transcode(), which is only correct if your szCSTABuffer is encoded in the local code page. You may need to create an explicit transcoder for the encoding in szCSTABuffer, use that transcoder to get to UTF-116, then use a UTF8 transcoder to get from UTF-16 to UTF-8.

Dave

Reply via email to