Re: How to extract a part of html file
Ben Finney [EMAIL PROTECTED] writes: Joe [EMAIL PROTECTED] wrote: I'm trying to extract part of html code from a tag to a tag For tag soup, use BeautifulSoup: URL:http://www.crummy.com/software/BeautifulSoup/ Except he's trying to extract an apparently random part of the file. BeautifulSoup is a wonderful thing for dealing with X/HTML documents as structured documents, which is how you want to deal with them most of the time. In this case, an re works nicely: import re s = 'span class=boldyellowBU and ends with TDTD img src=http://whatever/some.gif; /TD/TR/TABLE' r = re.match('span class=boldyellowBU(.*)TDTD img src=http://whatever/some.gif; /TD/TR/TABLE', s) r.group(1) ' and ends with ' String.find also works really well: start = s.find('span class=boldyellowBU') + len('span class=boldyellowBU') stop = s.find('TDTD img src=http://whatever/some.gif; /TD/TR/TABLE', start) s[start:stop] ' and ends with ' Not a lot to choose between them. mike -- Mike Meyer [EMAIL PROTECTED] http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. -- http://mail.python.org/mailman/listinfo/python-list
Re: How to extract a part of html file
Thanks Mike that is just what I was looking for, I have looked at beautifulsoup but it doesn't really do what I want it to do, maybe I'm just new to python and don't exactly know what it is doing just yet. However string find woks. Thanks On Thu, 20 Oct 2005 09:47:37 -0400, Mike Meyer wrote: Ben Finney [EMAIL PROTECTED] writes: Joe [EMAIL PROTECTED] wrote: I'm trying to extract part of html code from a tag to a tag For tag soup, use BeautifulSoup: URL:http://www.crummy.com/software/BeautifulSoup/ Except he's trying to extract an apparently random part of the file. BeautifulSoup is a wonderful thing for dealing with X/HTML documents as structured documents, which is how you want to deal with them most of the time. In this case, an re works nicely: import re s = 'span class=boldyellowBU and ends with TDTD img src=http://whatever/some.gif; /TD/TR/TABLE' r = re.match('span class=boldyellowBU(.*)TDTD img src=http://whatever/some.gif; /TD/TR/TABLE', s) r.group(1) ' and ends with ' String.find also works really well: start = s.find('span class=boldyellowBU') + len('span class=boldyellowBU') stop = s.find('TDTD img src=http://whatever/some.gif; /TD/TR/TABLE', start) s[start:stop] ' and ends with ' Not a lot to choose between them. mike -- http://mail.python.org/mailman/listinfo/python-list
Re: How to extract a part of html file
Joe [EMAIL PROTECTED] wrote: I'm trying to extract part of html code from a tag to a tag For tag soup, use BeautifulSoup: URL:http://www.crummy.com/software/BeautifulSoup/ Available as a package in Debian, probably other decent OSen also. -- \ I think it would be a good idea. -- Mahatma Gandhi (when | `\asked what he thought of Western civilization) | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list
How to extract a part of html file
I'm trying to extract part of html code from a tag to a tag code begins with span class=boldyellowBU and ends with TDTD img src=http://whatever/some.gif; /TD/TR/TABLE I was thinking of using a regular expression however I having hard time getting the desired string. I use htmlSource = urllib.urlopen(http://address/;) s = htmlSource.read() htmlSource.close() to get the html into a string, now I want to match string s from a span class Tag to img src=http://whatever/some.gif; /TD/TR/TABLE and store that into a new string. Thanks -- http://mail.python.org/mailman/listinfo/python-list