Adrian: Thanks for the code. The output is now correct. Am I using lxml incorrectly, or is it some issue with its HTML parser? Can I do without using an extra package (Path.pathlib)?

Charlie Clark : The output from "et.tostring()" has "
" added before each carriage return (which is 0D0A since I'm working on Windows). I don't know if it's the parser or tostring() that's causing that problem.

parser = et.HTMLParser(remove_blank_text=True,strip_cdata=False)
tree = et.parse(f,parser)
root = tree.getroot()
et.dump(root)

→ https://postimg.cc/1n2LWtpz

In the absolute, it doesn't matter since it won't affect how HTML is displayed in the browser, but I'd rather not have those extra strings added by lxml.

On 11/05/2022 07:37, Adrian Bool wrote:
Hi,

Your issue seems to be about the loading of your XML data and not the outputting of it.

Try:

    from lxml import etree as et

    parser = et.HTMLParser(remove_blank_text=True,strip_cdata=False)
    parser = et.HTMLParser(remove_blank_text=True)
    parser = et.HTMLParser()

    from pathlib import Path

    with Path('myxml.xml').open() as f:
        tree = et.parse(f, parser=parser)

    root = tree.getroot()
    print(et.tostring(root, pretty_print=True))
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]

Reply via email to