Wells Oliver schrieb:
Writing a class which essentially spiders a site and saves the files
locally. On a URLError exception, it sleeps for a second and tries again
(on 404 it just moves on). The relevant bit of code, including the
offending method:
class Handler(threading.Thread):
def __init__(self, url):
threading.Thread.__init__(self)
self.url = url
def save(self, uri, location):
try:
handler = urllib2.urlopen(uri)
except urllib2.HTTPError, e:
if e.code == 404:
return
else:
print "retrying %s (HTTPError)" % uri
time.sleep(1)
self.save(uri, location)
except urllib2.URLError, e:
print "retrying %s" % uri
time.sleep(1)
self.save(uri, location)
if not os.path.exists(os.path.dirname(location)):
os.makedirs(os.path.dirname(location))
file = open(location, "w")
file.write(handler.read())
file.close()
...
But what I am seeing is that after a retry (on catching a URLError
exception), I see bunches of "UnboundLocalError: local variable
'handler' referenced before assignment" errors on line 38, which is the
"file.write(handler.read())" line..
Your code defines the name handler only if the urllib2.urlopen is
successful. But you try later to access it uncoditionally, and of course
that fails.
You need to put the file-stuff after the urlopen, inside the try-except.
Also note that python has no tail-recursion-optimization, so your method
will recurse and at some point exhaust the stack if there are many errors.
You should consider writing it rather as while-loop, with breaking out
of it when the page could be fetched.
Diez
--
http://mail.python.org/mailman/listinfo/python-list