Re: How to extract a part of html file

Mike Meyer Thu, 20 Oct 2005 06:57:18 -0700

Ben Finney <[EMAIL PROTECTED]> writes:

> Joe <[EMAIL PROTECTED]> wrote:
>> I'm trying to extract part of html code from a tag to a tag
> For tag soup, use BeautifulSoup:
>     <URL:http://www.crummy.com/software/BeautifulSoup/>


Except he's trying to extract an apparently random part of the
file. BeautifulSoup is a wonderful thing for dealing with X/HTML
documents as structured documents, which is how you want to deal with
them most of the time.

In this case, an re works nicely:

>>> import re
>>> s = '<span class="boldyellow"><B><U>  and ends with TD><TD> <img 
>>> src="http://whatever/some.gif";> </TD></TR></TABLE>'
>>> r = re.match('<span class="boldyellow"><B><U>(.*)TD><TD> <img 
>>> src="http://whatever/some.gif";> </TD></TR></TABLE>', s)
>>> r.group(1)
'  and ends with '
>>> 

String.find also works really well:

>>> start = s.find('<span class="boldyellow"><B><U>') + len('<span 
>>> class="boldyellow"><B><U>')
>>> stop = s.find('TD><TD> <img src="http://whatever/some.gif";> 
>>> </TD></TR></TABLE>', start)
>>> s[start:stop]
'  and ends with '
>>> 

Not a lot to choose between them.

    <mike
-- 
Mike Meyer <[EMAIL PROTECTED]>                  http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to extract a part of html file

Reply via email to