ming wrote: > Hi, > i've a Python script which stopped working about a month ago. But until > then, it worked flawlessly for months (if not years). A tiny > self-contained 7-line script is provided below. > > i ran into an XML parsing problem with xml.dom.minidom and the error > message is included below. The weird thing is if i used an XML validator > on the web to validate against this particular URL directly, it is all > good. Moreover, i saved the page source in Firefox or Chrome then > validated against the saved XML file, it's also all good. > > Since the error happened at the very beginning of the input (line 1, > column 0) as indicated below, i was wondering if this is an encoding > mismatch. However, according to the saved page source in FireFox or > Chrome, there is the following at the beginning: > <?xml version="1.0" encoding="UTF-8"?> > > > <program> > ================================================= > #!/usr/bin/env python > > import urllib2 > from xml.dom.minidom import parseString > > fd = urllib2.urlopen('http://api.worldbank.org/countries') > data = fd.read() > fd.close() > dom = parseString(data) > ================================================= > > > <error msg> > ================================================= > Traceback (most recent call last): > File "./bugReport.py", line 9, in <module> > dom = parseString(data) > File "/usr/lib/python2.7/xml/dom/minidom.py", line 1931, in parseString > return expatbuilder.parseString(string) > File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 940, in > parseString > return builder.parseString(string) > File "/usr/lib/python2.7/xml/dom/expatbuilder.py", line 223, in > parseString > parser.Parse(string, True) > xml.parsers.expat.ExpatError: not well-formed (invalid token): line 1, > column 0 ================================================= > > > i'm running Python 2.7.5+ on Ubuntu 13.10. > > Thanks.
Looking into the data returned from the server: >>> data = urllib2.urlopen("http://api.worldbank.org/countries").read() >>> with open("tmp.dat", "w") as f: f.write(data) ... >>> [1]+ Angehalten python $ file tmp.dat tmp.dat: gzip compressed data, from FAT filesystem (MS-DOS, OS/2, NT) OK, let's expand: $ fg python >>> import gzip, StringIO >>> expanded_data = gzip.GzipFile(fileobj=StringIO.StringIO(data)).read() >>> import xml.dom.minidom >>> xml.dom.minidom.parseString(expanded_data) <xml.dom.minidom.Document instance at 0x19a1320> There may be a way to uncompress the gzipped data transparently, but I'm too lazy to look it up... -- https://mail.python.org/mailman/listinfo/python-list