On Sat, 06 Feb 2010 11:09:31 -0800, Jim wrote: > I generate some HTML and I want to include in my unit tests a check > for syntax. So I am looking for a program that will complain at any > syntax irregularities. > > I am familiar with Beautiful Soup (use it all the time) but it is > intended to cope with bad syntax. I just tried feeding > HTMLParser.HTMLParser some HTML containing '<p>a<b>b</p></b>' and it > didn't complain.
HTMLParser is a tokeniser, not a parser. It treats the data as a stream of tokens (tags, entities, PCDATA, etc); it doesn't know anything about the HTML DTD. For all it knows, the above example could be perfectly valid (the "b" element might allow both its start and end tags to be omitted). Does the validation need to be done in Python? If not, you can use "nsgmls" to validate any SGML document for which you have a DTD. OpenSP includes nsgmls along with the various HTML DTDs. -- http://mail.python.org/mailman/listinfo/python-list