-----Original Message----- From: Swatilekha Doloi Sent: Friday, March 26, 2010 5:12 PM To: '[email protected]' Subject: RE: transcodeTo results in non-printable chars
-----Original Message----- From: Swatilekha Doloi [mailto:[email protected]] Sent: Friday, March 26, 2010 3:35 PM To: [email protected] Subject: RE: transcodeTo results in non-printable chars >-----Original Message----- >From: David Bertoni [mailto:[email protected]] >Sent: Friday, March 26, 2010 2:03 AM >To: [email protected] >Subject: Re: transcodeTo results in non-printable chars > >On 3/25/2010 12:03 AM, Swatilekha Doloi wrote: >> Hi, >> >> Sorry for the delay in responding. My usage of the word 'non-printable' is >> probably incorrect. It displays something that looks like this: æc¾Òw×s %# >> S1# ÔwÔi�...@õ OQ S # õ”3 >> >> Using XMLString::transcode before giving the buffer to the UTF-8 Transcoder >> helped. This is my code now: >OK, this is a very bad idea if your data is not in the machine's local >code page. You need to provide more information about what the encoding >of the data in szCSTABuffer is. I actually don't know - it comes from a different program running on a different computer. And yes, you're right, there's no guarantee that it would match my machine's locale settings. I would have to create a UTF16 transcoder and use 'transcodeTo' to convert the buffer to UTF-16. But I'm a bit confused with the various options: fgUTF16BEncodingString 'UTF-16 (BE)' fgUTF16BEncodingString2 'UTF-16BE' fgUTF16EncodingString 'UTF-16' fgUTF16EncodingString2 'UCS2' fgUTF16EncodingString3 'IBM1200' fgUTF16EncodingString4 'IBM-1200' fgUTF16EncodingString5 'UTF16' fgUTF16EncodingString6 'UCS-2' fgUTF16EncodingString7 'ISO-10646-UCS-2' fgUTF16LEncodingString 'UTF-16 (LE)' fgUTF16LEncodingString2 'UTF-16LE' My target system is BE. Should I use the ones for BE (fgUTF16BEncodingString/ fgUTF16BEncodingString2)? Or would these be fine (fgUTF16EncodingString/ fgUTF16EncodingString5)? Addendum: Also, I would like to know how to do this cascading transcode? some-encoding-->UTF-16BE--->UTF-8 After I transcode the buffer to UTF-16, the output is of type XMLByte. The Transcoder for UTF-8 expects XMLCh* and not XMLByte* as the input. One last addition: transcodeTo for UTF-16 crashes sometimes. I don't know why this is happening. The call stack shows somewhere inside xercesc_2_8::XMLUTF16Transcoder::transcodeTo() a memcpy is crashing. This does not happen every time, though. /** Transcode to UTF-8 */ uiInLength = strlen(szCSTABuffer); uiOutLength = uiInLength * UTF16_BYTES_PER_CHARACTER; /** Allocate memory for the output of the transcode operation*/ xmlInput = new XMLByte[uiOutLength + 1]; if(xmlInput) { /** Transcode */ uiTotalChars = m_pUTF16Transcoder->transcodeTo((const XMLCh* const)szCSTABuffer, uiInLength, xmlInput, uiOutLength, uiCharsTranscoded, XMLTranscoder::UnRep_RepChar); xmlInput[uiTotalChars] = '\0'; } What am I doing wrong? Is it the typecast to XMLCh* from char* when calling transcodeTo? Any other way to convert char* to XMLCh*? Please help! > >> /*******************************************************************/ >> if(szCSTABuffer) >> { >> >> /** Transcode the CSTA Buffer into XMLCh* */ >> xmlInput = XMLString::transcode(szCSTABuffer); >> >> uiInLength = XMLString::stringLen(xmlInput); >> uiOutLength = uiInLength * UTF8_BYTES_PER_CHARACTER; >> //UTF8_BYTES_PER_CHARACTER is set to 4 >> >> /** Allocate memory for the output of the transcode operation*/ >> xmlTranscodedOutput = new XMLByte[uiOutLength + 1]; >> >> >> if(xmlTranscodedOutput) >> { >> /** Transcode */ >> // m_pUTF8Transcoder is of type XMLTranscoder* >> uiTotalChars = m_pUTF8Transcoder->transcodeTo( >> (const XMLCh* const)xmlInput, >This cast is not necessary. > Thanks I will remove it. >> uiInLength, >> xmlTranscodedOutput, >> uiOutLength, >> uiCharsTranscoded, >> >> XMLTranscoder::UnRep_RepChar); >> >> xmlTranscodedOutput[uiTotalChars] = '\0'; >> XMLString::release(&xmlInput); >> } >> } >> >> Variables are defined as follows: >> char* szCSTABuffer = NULL; >> XMLCh* xmlInput = NULL; >> XMLByte* xmlTranscodedOutput = NULL; >> unsigned int uiInLength = 0; >> unsigned int uiOutLength = 0; >> unsigned int uiCharsTranscoded = 0; >> unsigned int uiTotalChars = 0; >> /*******************************************************************/ >> >> I was wondering, is there a way to optimise this? >Without more information, it's hard to say what you should be doing. >However, my guess is you're trying to transcode from a single byte, or >variable byte encoding to UTF-8. > >The proper way to do this in Xerces-C is by pivoting through UTF-16. >You're almost there, but you're transcoding to UTF-16 through >XMLString::transcode(), which is only correct if your szCSTABuffer is >encoded in the local code page. You may need to create an explicit >transcoder for the encoding in szCSTABuffer, use that transcoder to get >to UTF-116, then use a UTF8 transcoder to get from UTF-16 to UTF-8. > >Dave I am not sure whether top-posting is ok or not. Apologies if it's unreadable.
