David Bear wrote: > I'm trying to understand how to use the HTMLParser in htmllib but I'm not > seeing enough examples. > > I just want to grab the contents of everything enclosed in a '<body>' tag, > i.e. items from where <body> begins to where </body> ends. I start by doing > > class HTMLBody(HTMLParser): > def __init__(self): > self.contents = [] > > def handle_starttag().. > > Now I'm stuck. I cant see that there is a method on handle_starttag that > would return everthing to the end tag. And I haven't seen anything on how > to define my one handle_unknowntag..
htmllib is designed to be used together with a formatting object. if you just want to work with tags, use sgmllib instead. some variation of the SGMLFilter example on this page might be what you need: http://effbot.org/librarybook/sgmllib.htm if you want a DOM-like structure instead of an event stream, use http://www.crummy.com/software/BeautifulSoup/ usage: >>> import BeautifulSoup as BS >>> soup = BS.BeautifulSoup(open("page.html")) >>> str(soup.body) '<body>\n<h1>Body Title</h1>\n<p>Paragraph</p>\n</body>' >>> soup.body.renderContents() '\n<h1>Body Title</h1>\n<p>Paragraph</p>\n' </F> -- http://mail.python.org/mailman/listinfo/python-list