Re: HTML scraping in python

Shawn O'Shea Thu, 11 Jun 2009 08:02:51 -0700

On Thu, Jun 11, 2009 at 10:20 AM, Paul Lussier <[email protected]>wrote:


> Paul Lussier <[email protected]> writes:
>
> > I stumbled up BeautifulSoup and am now trying to get that and the
> > mechanize module installed.
>
> Okay, I've got that installed.  I've figured out enough BS to get me a
> single row of the table into a list comprised of elements like:
> '<td>data</td>'
>
> Now I just need to figure out how to strip the html off of the data.
> I could do it by writing a regexp, I suppose, but I'm hoping there's a
> method which already does this.
>

There is. The BeautifulSoup docs/examples page has been invaluable to me in
the past for learning BS. Anyway, here's an example that should help.

$ python
Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13)
[GCC 4.0.1 (Apple Inc. build 5465)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from BeautifulSoup import BeautifulSoup as BS
>>> html = "<td>data</td>"
>>> soup = BS(html)
>>> soup
<td>data</td>
>>> soup.td
<td>data</td>
>>> soup.td.contents
[u'data']
>>>

-Shawn

_______________________________________________
gnhlug-discuss mailing list
[email protected]
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

Re: HTML scraping in python

Reply via email to