Re: Mutating an HTML file with BeautifulSoup

2QdxY4RzWzUUiLuE Fri, 19 Aug 2022 13:25:04 -0700

On 2022-08-19 at 20:12:35 +0100,
Barry <[email protected]> wrote:

> > On 19 Aug 2022, at 19:33, Chris Angelico <[email protected]> wrote:
> > 
> > What's the best way to precisely reconstruct an HTML file after
> > parsing it with BeautifulSoup?
> 
> I recall that in bs4 it parses into an object tree and loses the
> detail of the input.  I recently ported from very old bs to bs4 and
> hit the same issue.  So no it will not output the same as went in.
> 
> If you can trust the input to be parsed as xml, meaning all the rules
> of closing tags have been followed. Then I think you can parse and
> unparse thru xml to do what you want.

XML is in the same boat.  Except for "canonical form" (which underlies
cryptographically signed XML documents) the standards explicitly don't
require tools to round-trip the "source code."  The preferred method of
comparing XML documents is at the structural level rather than with
textual representations.  That way, the following two elements are the
same (and similar with a collection of sub-elements in a different order
in another document):

    <e a="b" c="d"/>

and

    <e c="d" a="b"/>

Dan
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Mutating an HTML file with BeautifulSoup

Reply via email to