On Fri, Nov 7, 2008 at 2:28 AM, Chris Rebert <[EMAIL PROTECTED]> wrote: > > On Fri, Nov 7, 2008 at 12:05 AM, Gilles Ganault <[EMAIL PROTECTED]> wrote: > > Hello > > > > I'm using the urllib2 module and Tor as a proxy to download data > > from the web. > > > > Occasionnally, urlllib2 returns 404, probably because of some issue > > with the Tor network. This code doesn't solve the issue, as it just > > loops through the same error indefinitely: > > > > ===== > *snip* > > Cheers, > Chris > -- > Follow the path of the Iguana... > http://rebertia.com > > > ===== > > > > Any idea of what I should do to handle this error properly? > > > > Thank you. > > -- > > http://mail.python.org/mailman/listinfo/python-list > > > -- > http://mail.python.org/mailman/listinfo/python-list
It sounds like Gilles may be having an issue with persistent 404s, in which case something like this could be more appropriate: for id in rows: url = 'http://www.acme.com/?code=' + id[0] retries = 0 while retries < 10: try: req = urllib2.Request(url, None, headers) response = urllib2.urlopen(req).read() except HTTPError,e: print 'Error code: ', e.code retries += 1 time.sleep(2) continue else: #should align with the `except` break else: print 'Fetch of ' + url + ' failed after ' + retries + 'tries.' handle_success(response) #should align with `url =` line -- http://mail.python.org/mailman/listinfo/python-list