Jim, 06.02.2010 20:09: > I generate some HTML and I want to include in my unit tests a check > for syntax. So I am looking for a program that will complain at any > syntax irregularities.
First thing to note here is that you should consider switching to an HTML generation tool that does this automatically. Generating markup manually is usually not a good idea. > I am familiar with Beautiful Soup (use it all the time) but it is > intended to cope with bad syntax. I just tried feeding > HTMLParser.HTMLParser some HTML containing '<p>a<b>b</p></b>' and it > didn't complain. > > That is, this: > h=HTMLParser.HTMLParser() > try: > h.feed('<p>a<b>b</p></b>') > h.close() > print "I expect not to see this line" > except Exception, err: > print "exception:",str(err) > gives me "I expect not to see this line". > > Am I using that routine incorrectly? Is there a natural Python choice > for this job? You can use lxml and let it validate the HTML output against the HTML DTD. Just load the DTD from a catalog using the DOCTYPE in the document (see the 'docinfo' property on the parse tree). http://codespeak.net/lxml/validation.html#id1 Note that when parsing the HTML file, you should disable the parser failure recovery to make sure it barks on syntax errors instead of fixing them up. http://codespeak.net/lxml/parsing.html#parser-options http://codespeak.net/lxml/parsing.html#parsing-html Stefan -- http://mail.python.org/mailman/listinfo/python-list