[ https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920219#action_12920219 ]
Ben Griffin edited comment on XERCESC-1947 at 10/12/10 10:33 AM: ----------------------------------------------------------------- Hi Boris, I'm pretty sure that any serializer that uses TranscodeToStr::transcode(const XMLCh *in, XMLSize_t len, XMLTranscoder* trans) will have this problem when the nature of the encoding that the transcoder is for is such that characters have variable sizes, most especially when the number of bytes needed to transcode a character is greater than the number of bytes used by the existing encoding. The problem is most easily exposed by the patch. Essentially, the failure happens because there isn't enough memory given to return any bytes eaten - even though there is a need to eat them. So when using UCS2 --> UTF-8, there is no problem until you get to 3-byte or more UTF-8 encodings:- characters larger than U+0x0800. When there is a single character to be transcoded then the initial allocSize is not going to be large enough to hold that one character, so the transcoder will return 0 'charsRead'. This error was exposed to me when querying attributes that were set with single character Unicode values from around U+2500. My code was doing something like... DOMAttr* enoda = enod->getAttributeNode(a_name); const XMLCh* x_attrval = enoda->getNodeValue(); if (x_attrval != NULL && x_attrval[0] != 0 ) { std::string attrval; char* value = (char*)TranscodeToStr(x_attrval,"UTF-8").adopt(); } I am not sure whether or not the supplied serializer uses TranscodeToStr in that sort of way - you are probably better informed than me about that. Maybe the component that I put the bug under shouldn't be 'Utilities' ? I'm not sure that I understand why you are interested in whether it affects parsing/serializing? It certainly affects being able to use TranscodeToStr::transcode(). I don't believe that the error is in XMLUTF8Transcoder::transcodeTo(), because AFAIK it doesn't have storage for semi-consumed characters. I believe that the error is with TranscodeToStr::transcode(). was (Author: mrthoughtful): Hi Boris, I'm pretty sure that any serializer that uses TranscodeToStr::transcode(const XMLCh *in, XMLSize_t len, XMLTranscoder* trans) will have this problem when the nature of the encoding that the transcoder is for is such that characters have variable sizes, most especially when the number of bytes needed to transcode a character is greater than the number of bytes used by the existing encoding. The problem is most easily exposed by the patch. Essentially, the failure happens because there isn't enough memory given to return any bytes eaten - even though there is a need to eat them. So when using UCS2 --> UTF-8, there is no problem until you get to 3-byte or more UTF-8 encodings:- characters larger than U+0x0800. When there is a single character to be transcoded then the initial allocSize is not going to be large enough to hold that one character, so the transcoder will return 0 'charsRead'. This error was exposed to me when querying attributes that were set with single byte Unicode values from around U+2500. My code was doing something like... DOMAttr* enoda = enod->getAttributeNode(a_name); const XMLCh* x_attrval = enoda->getNodeValue(); if (x_attrval != NULL && x_attrval[0] != 0 ) { std::string attrval; char* value = (char*)TranscodeToStr(x_attrval,"UTF-8").adopt(); } I am not sure whether or not the supplied serializer uses TranscodeToStr in that sort of way - you are probably better informed than me about that. Maybe the component that I put the bug under shouldn't be 'Utilities' ? I'm not sure that I understand why you are interested in whether it affects parsing/serializing? It certainly affects being able to use TranscodeToStr::transcode(). I don't believe that the error is in XMLUTF8Transcoder::transcodeTo(), because AFAIK it doesn't have storage for semi-consumed characters. I believe that the error is with TranscodeToStr::transcode(). > XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding > single characters that require 3 or more bytes as UTF8. > -------------------------------------------------------------------------------------------------------------------------------- > > Key: XERCESC-1947 > URL: https://issues.apache.org/jira/browse/XERCESC-1947 > Project: Xerces-C++ > Issue Type: Bug > Components: Utilities > Affects Versions: 3.1.0, 3.1.1 > Environment: Tested on mac os and debian linux. The failure is only > manifest on v3.1.x > Reporter: Ben Griffin > Priority: Critical > Attachments: TransService.patch, transtest.cpp > > > This can be demonstrated with the following 2 lines of code. > const XMLCh uval [] = { 0x254B, 0x0000}; //BOX DRAWINGS HEAVY VERTICAL > AND HORIZONTAL (needs 3 bytes for utf-8) > char* uc = (char*)TranscodeToStr(uval,"UTF-8").adopt(); cout << uc << > endl << flush; XMLString::release(&uc); //faulty exception; > The error is: "terminate called after throwing an instance of > 'xercesc_3_1::TranscodingException'" -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org For additional commands, e-mail: c-dev-h...@xerces.apache.org