Re: How to extract a part of html file

2005-10-20 Thread Mike Meyer
Ben Finney [EMAIL PROTECTED] writes:

 Joe [EMAIL PROTECTED] wrote:
 I'm trying to extract part of html code from a tag to a tag
 For tag soup, use BeautifulSoup:
 URL:http://www.crummy.com/software/BeautifulSoup/

Except he's trying to extract an apparently random part of the
file. BeautifulSoup is a wonderful thing for dealing with X/HTML
documents as structured documents, which is how you want to deal with
them most of the time.

In this case, an re works nicely:

 import re
 s = 'span class=boldyellowBU  and ends with TDTD img 
 src=http://whatever/some.gif; /TD/TR/TABLE'
 r = re.match('span class=boldyellowBU(.*)TDTD img 
 src=http://whatever/some.gif; /TD/TR/TABLE', s)
 r.group(1)
'  and ends with '
 

String.find also works really well:

 start = s.find('span class=boldyellowBU') + len('span 
 class=boldyellowBU')
 stop = s.find('TDTD img src=http://whatever/some.gif; 
 /TD/TR/TABLE', start)
 s[start:stop]
'  and ends with '
 

Not a lot to choose between them.

mike
-- 
Mike Meyer [EMAIL PROTECTED]  http://www.mired.org/home/mwm/
Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to extract a part of html file

2005-10-20 Thread Joe
Thanks Mike that is just what I was looking for, I have looked at
beautifulsoup but it doesn't really do what I want it to do, maybe I'm
just new to python and don't exactly know what it is doing just yet.
However string find woks. Thanks

On Thu, 20 Oct 2005 09:47:37 -0400, Mike Meyer wrote:

 Ben Finney [EMAIL PROTECTED] writes:
 
 Joe [EMAIL PROTECTED] wrote:
 I'm trying to extract part of html code from a tag to a tag
 For tag soup, use BeautifulSoup:
 URL:http://www.crummy.com/software/BeautifulSoup/
 
 Except he's trying to extract an apparently random part of the file.
 BeautifulSoup is a wonderful thing for dealing with X/HTML documents as
 structured documents, which is how you want to deal with them most of
 the time.
 
 In this case, an re works nicely:
 
 import re
 s = 'span class=boldyellowBU  and ends with TDTD img
 src=http://whatever/some.gif; /TD/TR/TABLE' r =
 re.match('span class=boldyellowBU(.*)TDTD img
 src=http://whatever/some.gif; /TD/TR/TABLE', s) r.group(1)
 '  and ends with '
 
 
 String.find also works really well:
 
 start = s.find('span class=boldyellowBU') + len('span
 class=boldyellowBU') stop = s.find('TDTD img
 src=http://whatever/some.gif; /TD/TR/TABLE', start)
 s[start:stop]
 '  and ends with '
 
 
 Not a lot to choose between them.
 
 mike
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to extract a part of html file

2005-10-20 Thread Ben Finney
Joe [EMAIL PROTECTED] wrote:
 I'm trying to extract part of html code from a tag to a tag

For tag soup, use BeautifulSoup:

URL:http://www.crummy.com/software/BeautifulSoup/

Available as a package in Debian, probably other decent OSen also.

-- 
 \ I think it would be a good idea.  -- Mahatma Gandhi (when |
  `\asked what he thought of Western civilization) |
_o__)  |
Ben Finney
-- 
http://mail.python.org/mailman/listinfo/python-list


How to extract a part of html file

2005-10-19 Thread Joe
I'm trying to extract part of html code from a tag to a tag code begins
with span class=boldyellowBU  and ends with 
TDTD img src=http://whatever/some.gif; /TD/TR/TABLE

I was thinking of using a regular expression however I having hard time
getting the desired string. I use 

htmlSource = urllib.urlopen(http://address/;)
s = htmlSource.read()
htmlSource.close()

to get the html into a string, now I want to match string s from a  span
class Tag to img src=http://whatever/some.gif; /TD/TR/TABLE and
store that into a new string. 

Thanks 
-- 
http://mail.python.org/mailman/listinfo/python-list