Michel Bouwmans wrote: > I don't think HTMLParser was doing anything wrong here. I needed to parse a > HTML document, but it contained script-blocks with document.write's in > them. I only care for the content outside these blocks but HTMLParser will > choke on such a block when it isn't encapsulated with HTML-comment markers > and it tries to parse the contents of the document.write's. ;)
Risking to repear myself: using the right tool for the job is generally a good idea. http://codespeak.net/lxml/lxmlhtml.html#cleaning-up-html Stefan -- http://mail.python.org/mailman/listinfo/python-list