On 2022-08-20, Stefan Ram <r...@zedat.fu-berlin.de> wrote:
> Jon Ribbens <jon+use...@unequivocal.eu> writes:
>>... or you could avoid all that faff and just do re.sub()?
> import bs4
> import re
> source = '<a name="b" href="http" accesskey="c"></a>'
> # Use Python to change the source, keeping the order of attributes.
> result = re.sub( r'href\s*=\s*"http"', r'href="https"', source )
> result = re.sub( r"href\s*=\s*'http'", r"href='https'", result )

You could go a bit harder with the regexp of course, e.g.:

  result = re.sub(

> # Now use BeautifulSoup only for the verification of the result.
> reference = bs4.BeautifulSoup( source, features="html.parser" )
> for a in reference.find_all( "a" ):
>     if a[ 'href' ]== 'http': a[ 'href' ]='https'
> print( bs4.BeautifulSoup( result, features="html.parser" )== reference )

Hmm, yes that seems like a pretty good idea.

Reply via email to