Have you tried using UCharsetDetector to determine the code page of the data? It sounds like you are transcoding but don't know what the source code page is. Regarding the crash in your transcoder, can you create a small standalone sample with data that crashes consistently? Then the kind ICU people can see if there's a fix to be made along the lines of "Don't crash when given invalid input data". john
-----Original Message----- From: Swatilekha Doloi [mailto:[email protected]] Sent: Monday, April 05, 2010 5:59 AM To: [email protected] Subject: Using cascading transcoders - Crashes in transcodeTo for UTF-16 Hi, Apologies for reposting this. I'm hoping that this time it is readable. I'm trying to transcode some data into UTF-8. The data I receive is not in UTF-16 format, so I can't use it as the input to the transcodeTo() function. Please help me! >-----Original Message----- >From: David Bertoni [mailto:[email protected]] > >On 3/25/2010 12:03 AM, Swatilekha Doloi wrote: >> Hi, >> >> Sorry for the delay in responding. My usage of the word >> 'non-printable' is probably incorrect. It displays something that >> looks like this: æc¾Òw×s %# S1# ÔwÔi�...@õ OQ S # õ”3 >> >> Using XMLString::transcode before giving the buffer to the UTF-8 >> Transcoder helped. This is my code now: >OK, this is a very bad idea if your data is not in the machine's local >code page. You need to provide more information about what the encoding >of the data in szCSTABuffer is. I actually don't know - it comes from a different program running on a different computer. And yes, you're right, there's no guarantee that it would match my machine's locale settings. I would have to create a UTF16 transcoder and use 'transcodeTo' to convert the buffer to UTF-16. But I'm a bit confused with the various options: fgUTF16BEncodingString 'UTF-16 (BE)' fgUTF16BEncodingString2 'UTF-16BE' fgUTF16EncodingString 'UTF-16' fgUTF16EncodingString2 'UCS2' fgUTF16EncodingString3 'IBM1200' fgUTF16EncodingString4 'IBM-1200' fgUTF16EncodingString5 'UTF16' fgUTF16EncodingString6 'UCS-2' fgUTF16EncodingString7 'ISO-10646-UCS-2' fgUTF16LEncodingString 'UTF-16 (LE)' fgUTF16LEncodingString2 'UTF-16LE' My target system is BE. Should I use the ones for BE (fgUTF16BEncodingString/ fgUTF16BEncodingString2)? Or would these be fine (fgUTF16EncodingString/ fgUTF16EncodingString5)? Addendum: Also, I would like to know how to do this cascading transcode? some-encoding-->UTF-16BE--->UTF-8 After I transcode the buffer to UTF-16, the output is of type XMLByte. The Transcoder for UTF-8 expects XMLCh* and not XMLByte* as the input. One last addition: transcodeTo for UTF-16 crashes sometimes. I don't know why this is happening. The call stack shows somewhere inside xercesc_2_8::XMLUTF16Transcoder::transcodeTo() a memcpy is crashing. This does not happen every time, though. /** Transcode to UTF-8 */ uiInLength = strlen(szCSTABuffer); uiOutLength = uiInLength * UTF16_BYTES_PER_CHARACTER; //UTF16_BYTES_PER_CHARACTER is set to 4 /** Allocate memory for the output of the transcode operation*/ xmlInput = new XMLByte[uiOutLength + 1]; if(xmlInput) { /** Transcode */ uiTotalChars = m_pUTF16Transcoder->transcodeTo((const XMLCh* const)szCSTABuffer, uiInLength, xmlInput, uiOutLength, uiCharsTranscoded, XMLTranscoder::UnRep_RepChar); xmlInput[uiTotalChars] = '\0'; } What am I doing wrong? Is it the typecast to XMLCh* from char* when calling transcodeTo? Any other way to convert char* to XMLCh*? Please help! > >> /*******************************************************************/ >> if(szCSTABuffer) >> { >> >> /** Transcode the CSTA Buffer into XMLCh* */ >> xmlInput = XMLString::transcode(szCSTABuffer); >> >> uiInLength = XMLString::stringLen(xmlInput); >> uiOutLength = uiInLength * UTF8_BYTES_PER_CHARACTER; >> //UTF8_BYTES_PER_CHARACTER is set to 4 >> >> /** Allocate memory for the output of the transcode operation*/ >> xmlTranscodedOutput = new XMLByte[uiOutLength + 1]; >> >> >> if(xmlTranscodedOutput) >> { >> /** Transcode */ >> // m_pUTF8Transcoder is of type XMLTranscoder* >> uiTotalChars = m_pUTF8Transcoder->transcodeTo( >> (const XMLCh* const)xmlInput, >This cast is not necessary. > Thanks I will remove it. >> uiInLength, >> xmlTranscodedOutput, >> uiOutLength, >> uiCharsTranscoded, >> >> XMLTranscoder::UnRep_RepChar); >> >> xmlTranscodedOutput[uiTotalChars] = '\0'; >> XMLString::release(&xmlInput); >> } >> } >> >> Variables are defined as follows: >> char* szCSTABuffer = NULL; >> XMLCh* xmlInput = NULL; >> XMLByte* xmlTranscodedOutput = NULL; >> unsigned int uiInLength = 0; >> unsigned int uiOutLength = 0; >> unsigned int uiCharsTranscoded = 0; >> unsigned int uiTotalChars = 0; >> /*******************************************************************/
