On Mon, 22 Aug 2022 at 10:04, Buck Evan <buck.2...@gmail.com> wrote: > > I've had much success doing round trips through the lxml.html parser. > > https://lxml.de/lxmlhtml.html > > I ditched bs for lxml long ago and never regretted it. > > If you find that you have a bunch of invalid html that lxml inadvertently > "fixes", I would recommend adding a stutter-step to your project: perform a > noop roundtrip thru lxml on all files. I'd then analyze any diff by > progressively excluding changes via `grep -vP`. > Unless I'm mistaken, all such changes should fall into no more than a dozen > groups. >
Will this round-trip mutate every single file and reorder the tag attributes? Because I really don't want to manually eyeball all those changes. ChrisA -- https://mail.python.org/mailman/listinfo/python-list