Python 2.3.5 seems to choke when trying to parse html files, because it doesn't realize that what's inside <!-- --> is a comment in HTML, even if this comment is inside <script> </script>, especially if it's a comment inside that script code too.
The html file: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html><head><title>Choke on this</title> <script language="JavaScript"> <!-- // </ht ml> - this is a comment in JavaScript, which is itself inside an HTML comment --> </script> </head> <body> Hey there </body> </html> The Python program: from urllib2 import urlopen from HTMLParser import HTMLParser f = urlopen("file:///PATH_TO_THE_ABOVE/index.html") p = HTMLParser() p.feed(f.read()) -- http://mail.python.org/mailman/listinfo/python-list