Jackie schrieb: > On 6 15 , 2 01 , Stefan Behnel <[EMAIL PROTECTED]> wrote: >> Jackie wrote: > >> import lxml.etree as et >> url = "http://www.economics.utoronto.ca/index.php/index/person/faculty/" >> tree = et.parse(url) >> > >> Stefan- - >> >> - - > > Thank you. But when I tried to run the above part, the following > message showed up: > > Traceback (most recent call last): > File "D:\TS\Python\workspace\eco_department\lxml_ver.py", line 3, in > <module> > tree = et.parse(url) > File "etree.pyx", line 1845, in etree.parse > File "parser.pxi", line 928, in etree._parseDocument > File "parser.pxi", line 932, in etree._parseDocumentFromURL > File "parser.pxi", line 849, in etree._parseDocFromFile > File "parser.pxi", line 557, in etree._BaseParser._parseDocFromFile > File "parser.pxi", line 631, in etree._handleParseResult > File "parser.pxi", line 602, in etree._raiseParseError > etree.XMLSyntaxError: line 2845: Premature end of data in tag html > line 8 > > Could you please tell me where went wrong?
Ah, ok, then the page is not actually XHTML, but broken HTML. Use this idiom instead: parser = et.HTMLParser() tree = et.parse(url, parser) Stefan -- http://mail.python.org/mailman/listinfo/python-list