[lxml] Re: Elements without textual content overlap

Stefan Behnel via lxml - The Python XML Toolkit Mon, 08 Sep 2025 23:57:40 -0700

Hi,

Schimon Jehudah schrieb am 27.08.25 um 09:19:

Function is at.


https://git.xmpp-it.net/sch/Rivista/src/branch/main/rivista/parser/xslt.py

Is this parsed as HTML? With which options?


Yes. I suppose so.

<xsl:output
   encoding = 'UTF-8'
   indent = 'yes'
   media-type = 'text/xml'
   method = 'html'
   omit-xml-decleration='no'
   version = '4.01' />


So, this is your Python code running the transformation:

    def transform(filepath_xml, filepath_xslt):
        tree = ET.parse(filepath_xml)
        xslt_stylesheet = ET.parse(filepath_xslt)
        xslt_transform = ET.XSLT(xslt_stylesheet)
        newdom = xslt_transform(tree)
        xml_data_bytes = ET.tostring(newdom, pretty_print=True)
        xml_data_str = xml_data_bytes.decode("utf-8")
        return xml_data_str

Since you're apparently using "<xsl:output>" to configure the output,"tostring()" is the wrong way of serialising the result, because it doesnot know about your XSLT output configuration. Instead, use e.g.


    xml_data_bytes = memoryview(newdom)
    xml_data_str = str(xml_data_bytes, 'UTF-8')

or, if you intend to write to a file:

    newdom.write_output("somefile.xml")

You were using XML serialisation instead of HTML serialisation. Thatcertainly makes a difference.

If this doesn't solve your issue, I'd suggest trying to reproduce themisbehaviour with the "xsltproc" program that comes with libxslt and if youcan make that show the same behaviour, report it to the libxslt project.It's probably not lxml that's responsible here.


Stefan

_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3//lists/lxml.python.org
Member address: [email protected]

[lxml] Re: Elements without textual content overlap

Reply via email to