[
https://issues.apache.org/jira/browse/XERCESC-1947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920219#action_12920219
]
Ben Griffin edited comment on XERCESC-1947 at 10/12/10 10:33 AM:
-----------------------------------------------------------------
Hi Boris,
I'm pretty sure that any serializer that uses TranscodeToStr::transcode(const
XMLCh *in, XMLSize_t len, XMLTranscoder* trans) will have this problem when the
nature of the encoding that the transcoder is for is such that characters have
variable sizes, most especially when the number of bytes needed to transcode a
character is greater than the number of bytes used by the existing encoding.
The problem is most easily exposed by the patch. Essentially, the failure
happens because there isn't enough memory given to return any bytes eaten -
even though there is a need to eat them.
So when using UCS2 --> UTF-8, there is no problem until you get to 3-byte or
more UTF-8 encodings:- characters larger than U+0x0800. When there is a single
character to be transcoded then the initial allocSize is not going to be large
enough to hold that one character, so the transcoder will return 0 'charsRead'.
This error was exposed to me when querying attributes that were set with single
character Unicode values from around U+2500.
My code was doing something like...
DOMAttr* enoda = enod->getAttributeNode(a_name);
const XMLCh* x_attrval = enoda->getNodeValue();
if (x_attrval != NULL && x_attrval[0] != 0 ) {
std::string attrval;
char* value = (char*)TranscodeToStr(x_attrval,"UTF-8").adopt();
}
I am not sure whether or not the supplied serializer uses TranscodeToStr in
that sort of way - you are probably better informed than me about that.
Maybe the component that I put the bug under shouldn't be 'Utilities' ? I'm
not sure that I understand why you are interested in whether it affects
parsing/serializing? It certainly affects being able to use
TranscodeToStr::transcode(). I don't believe that the error is in
XMLUTF8Transcoder::transcodeTo(), because AFAIK it doesn't have storage for
semi-consumed characters. I believe that the error is with
TranscodeToStr::transcode().
was (Author: mrthoughtful):
Hi Boris,
I'm pretty sure that any serializer that uses TranscodeToStr::transcode(const
XMLCh *in, XMLSize_t len, XMLTranscoder* trans) will have this problem when the
nature of the encoding that the transcoder is for is such that characters have
variable sizes, most especially when the number of bytes needed to transcode a
character is greater than the number of bytes used by the existing encoding.
The problem is most easily exposed by the patch. Essentially, the failure
happens because there isn't enough memory given to return any bytes eaten -
even though there is a need to eat them.
So when using UCS2 --> UTF-8, there is no problem until you get to 3-byte or
more UTF-8 encodings:- characters larger than U+0x0800. When there is a single
character to be transcoded then the initial allocSize is not going to be large
enough to hold that one character, so the transcoder will return 0 'charsRead'.
This error was exposed to me when querying attributes that were set with single
byte Unicode values from around U+2500.
My code was doing something like...
DOMAttr* enoda = enod->getAttributeNode(a_name);
const XMLCh* x_attrval = enoda->getNodeValue();
if (x_attrval != NULL && x_attrval[0] != 0 ) {
std::string attrval;
char* value = (char*)TranscodeToStr(x_attrval,"UTF-8").adopt();
}
I am not sure whether or not the supplied serializer uses TranscodeToStr in
that sort of way - you are probably better informed than me about that.
Maybe the component that I put the bug under shouldn't be 'Utilities' ? I'm
not sure that I understand why you are interested in whether it affects
parsing/serializing? It certainly affects being able to use
TranscodeToStr::transcode(). I don't believe that the error is in
XMLUTF8Transcoder::transcodeTo(), because AFAIK it doesn't have storage for
semi-consumed characters. I believe that the error is with
TranscodeToStr::transcode().
> XMLUTF8Transcoder::transcodeTo fails with an exception when transcoding
> single characters that require 3 or more bytes as UTF8.
> --------------------------------------------------------------------------------------------------------------------------------
>
> Key: XERCESC-1947
> URL: https://issues.apache.org/jira/browse/XERCESC-1947
> Project: Xerces-C++
> Issue Type: Bug
> Components: Utilities
> Affects Versions: 3.1.0, 3.1.1
> Environment: Tested on mac os and debian linux. The failure is only
> manifest on v3.1.x
> Reporter: Ben Griffin
> Priority: Critical
> Attachments: TransService.patch, transtest.cpp
>
>
> This can be demonstrated with the following 2 lines of code.
> const XMLCh uval [] = { 0x254B, 0x0000}; //BOX DRAWINGS HEAVY VERTICAL
> AND HORIZONTAL (needs 3 bytes for utf-8)
> char* uc = (char*)TranscodeToStr(uval,"UTF-8").adopt(); cout << uc <<
> endl << flush; XMLString::release(&uc); //faulty exception;
> The error is: "terminate called after throwing an instance of
> 'xercesc_3_1::TranscodingException'"
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]