[ 
https://issues.apache.org/jira/browse/AXIS-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12490888
 ] 

Rodrigo Ruiz commented on AXIS-2342:
------------------------------------

I am a bit puzzled with this bug.

In principle, I agree with Thiago. If the output writer is created with the 
correct encoding (and it seems it is), there should be no need to "re-encode" 
characters above 0x7F in UTF-8, or above 0xFFFF in UTF-16.

It seems the class org.apache.axis.components.encoding.AbstractXmlEncoder fixes 
this issue in its "encode" method. The problem is that none of its subclasses 
uses the same strategy for their writeEncoded() methods. Why is it so?

In fact, looking at the code, once the "entities replacement" code is removed 
from the subclasses, they are all the same! It seems we could live with only a 
single XMLEncoder implementation for all encodings! Please, can anybody confirm 
or correct this?

> Reopen issue: Character entities are escaped too aggressively
> -------------------------------------------------------------
>
>                 Key: AXIS-2342
>                 URL: https://issues.apache.org/jira/browse/AXIS-2342
>             Project: Axis
>          Issue Type: Bug
>          Components: Serialization/Deserialization
>    Affects Versions: 1.0
>         Environment: Operating System: All
> Platform: All
>            Reporter: Thiago Jung Bauermann
>         Assigned To: Axis Developers Mailing List
>         Attachments: PATCH_2342.txt, TESTCASE_2342.txt
>
>
> We are using SOAP to send XML documents from client to server and back. The 
> documents contain a lot of non-ASCII data. This is encoded as UTF-8 by us. 
> However, when retrieved from an Axis server, Axis will escape almost all of 
> our 
> characters into character entities (so &#...;) This means messages become 
> about 
> three times as big as they have to for 'international' documents, which for 
> us 
> is a large performance problem. I narrowed down the problem to
>   XMLUtils::xmlEncodeString
> that has the code:
>                 if (((int)chars[i]) > 127) {
>                         strBuf.append("&#");
>                         strBuf.append((int)chars[i]);
>                         strBuf.append(";");
> This seems unnecessary to me, as Axis will send all messages in UTF-8 anyway, 
> for which no encoding is necessary (and should encoding be configurable, I 
> feel 
> this should be escaped elsewhere).
> Is there any reason for this code, I commented it out and it seemed to have 
> no 
> adverse effect on our application (apart from reduced network traffic)?
> Tested with 1.0, also looked up in the sources of 1.1-rc2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to