On Thu, Jun 11, 2009 at 10:20 AM, Paul Lussier <[email protected]>wrote:
> Paul Lussier <[email protected]> writes: > > > I stumbled up BeautifulSoup and am now trying to get that and the > > mechanize module installed. > > Okay, I've got that installed. I've figured out enough BS to get me a > single row of the table into a list comprised of elements like: > '<td>data</td>' > > Now I just need to figure out how to strip the html off of the data. > I could do it by writing a regexp, I suppose, but I'm hoping there's a > method which already does this. > There is. The BeautifulSoup docs/examples page has been invaluable to me in the past for learning BS. Anyway, here's an example that should help. $ python Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> from BeautifulSoup import BeautifulSoup as BS >>> html = "<td>data</td>" >>> soup = BS(html) >>> soup <td>data</td> >>> soup.td <td>data</td> >>> soup.td.contents [u'data'] >>> -Shawn
_______________________________________________ gnhlug-discuss mailing list [email protected] http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/
