bruce wrote: > that's exactly what i'm trying to accomplish... i've used tidy, but it seems > to still generate warnings... > > initFile -> tidy ->cleanFile -> perl app (using xpath/livxml) > > the xpath/linxml functions in the perl app complain regarding the file. my > thought is that tidy isn't cleaning enough, or that the perl xpath/libxml > functions are too strict!
Clean HTML is not valid XML. If you want to process the output with an XML library you'll need to tell Tidy to output XHTML. Then it should be valid for XML processing. Of course BeautifulSoup is also a very nice library if you need to extract some information, but don't necessarilly require XML processing to do it. -- Matt Good -- http://mail.python.org/mailman/listinfo/python-list