On 11 May 2022, at 11:53, Gilles wrote:
Adrian: Thanks for the code. The output is now correct. Am I using
lxml incorrectly, or is it some issue with its HTML parser? Can I do
without using an extra package (Path.pathlib)?
Charlie Clark : The output from "et.tostring()" has "" added before
each carriage return (which is 0D0A since I'm working on Windows). I
don't know if it's the parser or tostring() that's causing that
problem.
parser = et.HTMLParser(remove_blank_text=True,strip_cdata=False)
tree = et.parse(f,parser)
root = tree.getroot()
et.dump(root)
It could always be a bug, but really we need a sample file to test.
Which version of lxml and Python are you using?
But, if all you want is pretty printing then I recommend simply using
the command line tool `tidy`.
Charlie
--
Charlie Clark
Managing Director
Clark Consulting & Research
German Office
Sengelsweg 34
Düsseldorf
D- 40489
Tel: +49-203-3925-0390
Mobile: +49-178-782-6226
_______________________________________________
lxml - The Python XML Toolkit mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/lxml.python.org/
Member address: [email protected]