I'm trying to parse html documents from the web, using the HTMLParser class of the HTMLParser module (python 2.3), but some web documents are not fully valids. When the parser finds an invalid tag, he raises an exception. Then it seems impossible to resume the parsing just after where the exception was raised. I'd like to continue parsing an html document even if an invalid tag was found. Is it possible to do this ?
Here is a little non valid html document. ---------- <html> <body> <a href="""">bogus link</a> </body> </html> ---------- -- http://mail.python.org/mailman/listinfo/python-list