Hello OpenJDK team,

I would like to seek clarification on a behavior observed when performing an 
XSL transformation followed by XML parsing.

Problem Description :
A SAXParseException is encountered when parsing the result of a Java XSL 
transformation that uses HTML output and contains accented characters 
represented.

Scenario:
We perform an XSL transformation using `Transformer`, and then attempt to parse 
the resulting output using `DocumentBuilder`.

When the XSLT uses:
<xsl:output method="html" encoding="UTF-8" indent="yes"/>

the transformation succeeds, but parsing the result fails with the following 
error:

[Fatal Error] :4:98: The entity "eacute" was referenced, but not declared.
org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 98; The entity 
"eacute" was referenced, but not declared.
        at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
        at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:338)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
        at HTMLEntityParsingTest.main(HTMLEntityParsingTest.java:40)


However, when we change the XSLT output method to the below, the issue does not 
occur.
<xsl:output method="xml" encoding="UTF-8" indent="yes"/>

Observation:
It appears that the HTML output contains named entities such as `&eacute;`, 
which are not recognized by the XML parser.

Could you please confirm whether this behavior is expected, or if this could be 
considered a bug or limitation in the current implementation?

Releases:
The issue is consistent in all OpenJDK version(JDK8 and above)

Thanks and regards,
Shruthi

Reply via email to