Hi Mukul,

If I understand correctly, with your suggestion é will be replaced with 
"Hello" before transformation and the XSLT will not throw any issue. But that 
doesn't seem to resolve our issue.

Please find the testcase below.

EmojiInput.xml
<Order DocumentType="0001" DraftOrderFlag="N" EnterpriseCode="IBM" 
EntryType="Web" OrderNo="" OrderDate="" SellerOrganizationCode="IBM" 
BuyerOrganizationCode="" ReqDeliveryDate="" ReqShipDate="">
<OrderLines>
<OrderLine OrderedQty="1" PrimeLineNo="1" SubLineNo="1">
<Item ItemID="IBMItem10" UnitOfMeasure="EACH" ProductClass="Good" 
ItemShortDesc="75 compléments premium et appétissants au poulet 300 gr"/>
</OrderLine>
</OrderLines>
<PersonInfoShipTo Country="US"/>
<PersonInfoBillTo Country="US"/>
</Order>


Test_html.xsl
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
<xsl:output method="html" encoding="UTF-8" indent="yes"/>
<!-- Root template -->
<xsl:template match="/">
<xsl:apply-templates select="Order"/>
</xsl:template>
<!-- Match Order element -->
<xsl:template match="Order">
<Order>
<xsl:copy-of select="@*"/>
<xsl:apply-templates select="OrderLines"/>
<xsl:apply-templates select="PersonInfoShipTo"/>
<xsl:apply-templates select="PersonInfoBillTo"/>
</Order>
</xsl:template>
<!-- Match OrderLines -->
<xsl:template match="OrderLines">
<OrderLines>
<xsl:apply-templates select="OrderLine"/>
</OrderLines>
</xsl:template>
<!-- Match each OrderLine and modify Item -->
<xsl:template match="OrderLine">
<OrderLine>
<xsl:copy-of select="@*"/>
<xsl:apply-templates select="Item"/>
</OrderLine>
</xsl:template>
<!-- Match Item and copy ItemShortDesc into ManufacturerItemDesc -->
<xsl:template match="Item">
<Item>
<xsl:copy-of select="@*"/>
<xsl:attribute name="ManufacturerItemDesc">
<xsl:value-of select="@ItemShortDesc"/>
</xsl:attribute>
</Item>
</xsl:template>
<!-- Copy PersonInfoShipTo and PersonInfoBillTo as-is -->
<xsl:template match="PersonInfoShipTo | PersonInfoBillTo">
<xsl:copy>
<xsl:copy-of select="@*"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>


HTMLEntityParsingTest.java
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.FileInputStream;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.w3c.dom.Document;

public class HTMLEntityParsingTest
{
    public static void main(String[] args)
    {
        try
        {
            Source xmlSource = new StreamSource(new 
FileInputStream("EmojiInput.xml"));

            Source xslSource = new StreamSource(new 
FileInputStream("Test_html.xsl"));

            TransformerFactory transformerFactory = 
TransformerFactory.newInstance();
            Transformer transformer = 
transformerFactory.newTransformer(xslSource);

            ByteArrayOutputStream baos = new ByteArrayOutputStream();
            Result result = new StreamResult(baos);

            transformer.transform(xmlSource, result);

            byte[] transformedOutput = baos.toByteArray();

            DocumentBuilderFactory factory = 
DocumentBuilderFactory.newInstance();
            factory.setNamespaceAware(true);
            DocumentBuilder builder = factory.newDocumentBuilder();

            Document doc = builder.parse(new 
ByteArrayInputStream(transformedOutput));
        }
        catch (org.xml.sax.SAXParseException e)
        {
            e.printStackTrace();
            System.exit(1);
        }
        catch (Exception e)
        {
            e.printStackTrace();
            System.exit(1);
        }
    }
}


Failure:
[Fatal Error] :4:98: The entity "eacute" was referenced, but not declared.
org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 98; The entity 
"eacute" was referenced, but not declared.
        at 
com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
        at 
com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:338)
        at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
        at HTMLEntityParsingTest.main(HTMLEntityParsingTest.java:40)


Thanks
Shruthi


________________________________
From: Mukul Gandhi <[email protected]>
Sent: Friday, April 10, 2026 1:20 PM
To: Shruthi . <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: [External] : Clarification on XSL HTML output and XML parsing

Hi Shruthi,
     The following XSLT 1.0 transformation using Xalan-J 2.7.3,
resolves the issue you've described.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
   <!ENTITY eacute "Hello">
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
                         version="1.0">

       <xsl:output method="html" encoding="UTF-8" indent="yes"/>

       <xsl:template match="/">
            <html>
              <head>
                  <title>Openjdk bug report test</title>
              </head>
              <body>
                  <h2>&eacute;</h2>
               </body>
            </html>
        </xsl:template>

</xsl:stylesheet>

This resolution should work with Xalan-J bundled with OpenJDK as well.

HTH

On Thu, Apr 9, 2026 at 11:04 AM Shruthi . <[email protected]> wrote:

> I would like to seek clarification on a behavior observed when performing an 
> XSL transformation followed by XML parsing.
>
> Problem Description :
> A SAXParseException is encountered when parsing the result of a Java XSL 
> transformation that uses HTML output and contains accented characters 
> represented.
>
> Scenario:
> We perform an XSL transformation using `Transformer`, and then attempt to 
> parse the resulting output using `DocumentBuilder`.
>
> When the XSLT uses:
> <xsl:output method="html" encoding="UTF-8" indent="yes"/>
>
> the transformation succeeds, but parsing the result fails with the following 
> error:
>
> [Fatal Error] :4:98: The entity "eacute" was referenced, but not declared.
> org.xml.sax.SAXParseException; lineNumber: 4; columnNumber: 98; The entity 
> "eacute" was referenced, but not declared.
>         at 
> com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:257)
>         at 
> com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:338)
>         at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
>         at HTMLEntityParsingTest.main(HTMLEntityParsingTest.java:40)
>
>
> However, when we change the XSLT output method to the below, the issue does 
> not occur.
> <xsl:output method="xml" encoding="UTF-8" indent="yes"/>
>
> Observation:
> It appears that the HTML output contains named entities such as `&eacute;`, 
> which are not recognized by the XML parser.
>
> Could you please confirm whether this behavior is expected, or if this could 
> be considered a bug or limitation in the current implementation?
>
> Releases:
> The issue is consistent in all OpenJDK version(JDK8 and above)



--
Regards,
Mukul Gandhi

Reply via email to