On 2022-08-20, Stefan Ram <r...@zedat.fu-berlin.de> wrote: > Jon Ribbens <jon+use...@unequivocal.eu> writes: >>... or you could avoid all that faff and just do re.sub()? > > import bs4 > import re > > source = '<a name="b" href="http" accesskey="c"></a>' > > # Use Python to change the source, keeping the order of attributes. > > result = re.sub( r'href\s*=\s*"http"', r'href="https"', source ) > result = re.sub( r"href\s*=\s*'http'", r"href='https'", result )
You could go a bit harder with the regexp of course, e.g.: result = re.sub( r"""(<\s*a\s+[^>]*href\s*=\s*)(['"])\s*OLD\s*\2""", r"\1\2NEW\2", source, flags=re.IGNORECASE ) > # Now use BeautifulSoup only for the verification of the result. > > reference = bs4.BeautifulSoup( source, features="html.parser" ) > for a in reference.find_all( "a" ): > if a[ 'href' ]== 'http': a[ 'href' ]='https' > > print( bs4.BeautifulSoup( result, features="html.parser" )== reference ) Hmm, yes that seems like a pretty good idea. -- https://mail.python.org/mailman/listinfo/python-list