trying to parse non valid html documents with HTMLParser

florent Tue, 02 Aug 2005 12:35:35 -0700

I'm trying to parse html documents from the web, using the HTMLParser 
class of the HTMLParser module (python 2.3), but some web documents are 
not fully valids. When the parser finds an invalid tag, he raises an 
exception. Then it seems impossible to resume the parsing just after 
where the exception was raised. I'd like to continue parsing an html 
document even if an invalid tag was found. Is it possible to do this ?


Here is a little non valid html document.
----------
<html>
<body>
<a href="""">bogus link</a>
</body>
</html>
----------
-- 
http://mail.python.org/mailman/listinfo/python-list

trying to parse non valid html documents with HTMLParser

Reply via email to