Re: Good HTML Parser

Stefan Behnel Thu, 17 Jul 2008 08:08:04 -0700

Chris wrote:
> Can anyone recommend a good HTML/XHTML parser, similar to
> HTMLParser.HTMLParser or htmllib.HTMLParser, but able to intelligently
> know that certain tags, like <br>, are implicitly closed? I need to
> iterate through the entire DOM, building up a DOM path, but the stdlib
> parsers aren't calling handle_endtag() for any implicitly closed tags.
> I looked at BeautifulSoup, but it only seems to work by first parsing
> the entire document, then allowing you to query the document
> afterwards. I need something like a SAX parser.


Try lxml.html. It's very memory friendly and extremely fast, so you may end up
without any reason to use SAX anymore.

http://codespeak.net/lxml/

Stefan
--
http://mail.python.org/mailman/listinfo/python-list

Re: Good HTML Parser

Reply via email to