[
https://issues.apache.org/jira/browse/XERCESC-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15230354#comment-15230354
]
Scott Cantor commented on XERCESC-2065:
---------------------------------------
A little light debugging suggests that the non-encoding of CR is a hardwired
behavior of the DOMLSSerializerImpl class when it handles text nodes.
There's a parameter to XMLFormatter::formatBuf to control the set of characters
that are escaped by it, and the call to that is done with an enum value
(XMLFormatter::CharEscapes) that does not escape CR. I think only calling it
with StdEscapes would cause CR to be escaped.
I don't see any options or features involved with the DOMLSSerializer that
would impact the behavior, assuming there was a reason for not encoding CR in
general.
> Carriage return entities are not handled properly
> -------------------------------------------------
>
> Key: XERCESC-2065
> URL: https://issues.apache.org/jira/browse/XERCESC-2065
> Project: Xerces-C++
> Issue Type: Bug
> Components: DOM, Non-Validating Parser, SAX/SAX2
> Affects Versions: 3.1.3
> Reporter: Scott Cantor
> Priority: Critical
>
> Documents with CR entities don't seem to round trip properly in the parser if
> you parse them and then serialize them. It's possible the bug is in the
> serializer because signed documents don't end up with corrupt signatures, but
> that may be due to insufficient testing as of yet.
> A simple example:
> {code}
> <?xml version="1.0" encoding="UTF-8"?>
> <foo>
> text more<&
> </foo>
> {code}
> Running that through DOMPrint or SAX2Print:
> {code}
> <foo>
> more<&
> </foo>
> {code}
> Notice the CR entity is removed, but also all of the characters immediately
> in front of it.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]