Hi Mukul, If I understand correctly, with your suggestion é will be replaced with "Hello" before transformation and the XSLT will not throw any issue. But that doesn't seem to resolve our issue.
Please find the testcase below. EmojiInput.xml <Order DocumentType="0001" DraftOrderFlag="N" EnterpriseCode="IBM" EntryType="Web" OrderNo="" OrderDate="" SellerOrganizationCode="IBM" BuyerOrganizationCode="" ReqDeliveryDate="" ReqShipDate=""> <OrderLines> <OrderLine OrderedQty="1" PrimeLineNo="1" SubLineNo="1"> <Item ItemID="IBMItem10" UnitOfMeasure="EACH" ProductClass="Good" ItemShortDesc="75 compléments premium et appétissants au poulet 300 gr"/> </OrderLine> </OrderLines> <PersonInfoShipTo Country="US"/> <PersonInfoBillTo Country="US"/> </Order> Test_html.xsl <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html" encoding="UTF-8" indent="yes"/> <!-- Root template --> <xsl:template match="/"> <xsl:apply-templates select="Order"/> </xsl:template> <!-- Match Order element --> <xsl:template match="Order"> <Order> <xsl:copy-of select="@*"/> <xsl:apply-templates select="OrderLines"/> <xsl:apply-templates select="PersonInfoShipTo"/> <xsl:apply-templates select="PersonInfoBillTo"/> </Order> </xsl:template> <!-- Match OrderLines --> <xsl:template match="OrderLines"> <OrderLines> <xsl:apply-templates select="OrderLine"/> </OrderLines> </xsl:template> <!-- Match each OrderLine and modify Item --> <xsl:template match="OrderLine"> <OrderLine> <xsl:copy-of select="@*"/> <xsl:apply-templates select="Item"/> </OrderLine> </xsl:template> <!-- Match Item and copy ItemShortDesc into ManufacturerItemDesc --> <xsl:template match="Item"> <Item> <xsl:copy-of select="@*"/> <xsl:attribute name="ManufacturerItemDesc"> <xsl:value-of select="@ItemShortDesc"/> </xsl:attribute> </Item> </xsl:template> <!-- Copy PersonInfoShipTo and PersonInfoBillTo as-is --> <xsl:template match="PersonInfoShipTo | PersonInfoBillTo"> <xsl:copy> <xsl:copy-of select="@*"/> </xsl:copy> </xsl:template> </xsl:stylesheet> HTMLEntityParsingTest.java import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.FileInputStream; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.transform.Result; import javax.xml.transform.Source; import javax.xml.transform.Transformer; import javax.xml.transform.TransformerFactory; import javax.xml.transform.stream.StreamResult; import javax.xml.transform.stream.StreamSource; import org.w3c.dom.Document; public class HTMLEntityParsingTest { public static void main(String[] args) { try { Source xmlSource = new StreamSource(new FileInputStream("EmojiInput.xml")); Source xslSource = new StreamSource(new FileInputStream("Test_html.xsl")); TransformerFactory transformerFactory = TransformerFactory.newInstance(); Transformer transformer = transformerFactory.newTransformer(xslSource); ByteArrayOutputStream baos = new ByteArrayOutputStream(); Result result = new StreamResult(baos); transformer.transform(xmlSource, result); byte[] transformedOutput = baos.toByteArray(); DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(new ByteArrayInputStream(transformedOutput)); } catch (org.xml.sax.SAXParseException e) { e.printStackTrace(); System.exit(1); } catch (Exception e) { e.printStackTrace(); System.exit(1); } } } Failure: [Fatal Error] :4:98: The entity "eacute" was referenced, but not declared. org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 98; The entity "eacute" was referenced, but not declared. at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:338) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) at HTMLEntityParsingTest.main(HTMLEntityParsingTest.java:40) Thanks Shruthi ________________________________ From: Mukul Gandhi <[email protected]> Sent: Friday, April 10, 2026 1:20 PM To: Shruthi . <[email protected]> Cc: [email protected] <[email protected]> Subject: Re: [External] : Clarification on XSL HTML output and XML parsing Hi Shruthi, The following XSLT 1.0 transformation using Xalan-J 2.7.3, resolves the issue you've described. <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE xsl:stylesheet [ <!ENTITY eacute "Hello"> ]> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="html" encoding="UTF-8" indent="yes"/> <xsl:template match="/"> <html> <head> <title>Openjdk bug report test</title> </head> <body> <h2>é</h2> </body> </html> </xsl:template> </xsl:stylesheet> This resolution should work with Xalan-J bundled with OpenJDK as well. HTH On Thu, Apr 9, 2026 at 11:04 AM Shruthi . <[email protected]> wrote: > I would like to seek clarification on a behavior observed when performing an > XSL transformation followed by XML parsing. > > Problem Description : > A SAXParseException is encountered when parsing the result of a Java XSL > transformation that uses HTML output and contains accented characters > represented. > > Scenario: > We perform an XSL transformation using `Transformer`, and then attempt to > parse the resulting output using `DocumentBuilder`. > > When the XSLT uses: > <xsl:output method="html" encoding="UTF-8" indent="yes"/> > > the transformation succeeds, but parsing the result fails with the following > error: > > [Fatal Error] :4:98: The entity "eacute" was referenced, but not declared. > org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 98; The entity > "eacute" was referenced, but not declared. > at > com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257) > at > com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:338) > at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121) > at HTMLEntityParsingTest.main(HTMLEntityParsingTest.java:40) > > > However, when we change the XSLT output method to the below, the issue does > not occur. > <xsl:output method="xml" encoding="UTF-8" indent="yes"/> > > Observation: > It appears that the HTML output contains named entities such as `é`, > which are not recognized by the XML parser. > > Could you please confirm whether this behavior is expected, or if this could > be considered a bug or limitation in the current implementation? > > Releases: > The issue is consistent in all OpenJDK version(JDK8 and above) -- Regards, Mukul Gandhi
