Adrian: Thanks for the code. The output is now correct. Am I using lxml
incorrectly, or is it some issue with its HTML parser? Can I do without
using an extra package (Path.pathlib)?
Charlie Clark : The output from "et.tostring()" has " " added before
each carriage return (which is 0D0A since I'm working on Windows). I
don't know if it's the parser or tostring() that's causing that problem.
parser = et.HTMLParser(remove_blank_text=True,strip_cdata=False)
tree = et.parse(f,parser)
root = tree.getroot()
et.dump(root)
→ https://postimg.cc/1n2LWtpz
In the absolute, it doesn't matter since it won't affect how HTML is
displayed in the browser, but I'd rather not have those extra strings
added by lxml.
On 11/05/2022 07:37, Adrian Bool wrote:
Hi,
Your issue seems to be about the loading of your XML data and not the
outputting of it.
Try:
from lxml import etree as et
parser = et.HTMLParser(remove_blank_text=True,strip_cdata=False)
parser = et.HTMLParser(remove_blank_text=True)
parser = et.HTMLParser()
from pathlib import Path
with Path('myxml.xml').open() as f:
tree = et.parse(f, parser=parser)
root = tree.getroot()
print(et.tostring(root, pretty_print=True))
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]