On Mon, 22 Aug 2022 at 10:04, Buck Evan <buck.2...@gmail.com> wrote:
>
> I've had much success doing round trips through the lxml.html parser.
>
> https://lxml.de/lxmlhtml.html
>
> I ditched bs for lxml long ago and never regretted it.
>
> If you find that you have a bunch of invalid html that lxml inadvertently 
> "fixes", I would recommend adding a stutter-step to your project: perform a 
> noop roundtrip thru lxml on all files. I'd then analyze any diff by 
> progressively excluding changes via `grep -vP`.
> Unless I'm mistaken, all such changes should fall into no more than a dozen 
> groups.
>

Will this round-trip mutate every single file and reorder the tag
attributes? Because I really don't want to manually eyeball all those
changes.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to