On Mar 27, 4:41 pm, "supercooper" <[EMAIL PROTECTED]> wrote: > On Mar 27, 3:13 pm, "Gabriel Genellina" <[EMAIL PROTECTED]> > wrote: > > > > > En Tue, 27 Mar 2007 16:21:55 -0300, supercooper <[EMAIL PROTECTED]> > > escribió: > > > > I am downloading images using the script below. Sometimes it will go > > > for 10 mins, sometimes 2 hours before timing out with the following > > > error: > > > > urllib.urlretrieve(fullurl, localfile) > > > IOError: [Errno socket error] (10060, 'Operation timed out') > > > > I have searched this forum extensively and tried to avoid timing out, > > > but to no avail. Anyone have any ideas as to why I keep getting a > > > timeout? I thought setting the socket timeout did it, but it didnt. > > > You should do the opposite: timing out *early* -not waiting 2 hours- and > > handling the error (maybe using a queue to hold pending requests) > > > -- > > Gabriel Genellina > > Gabriel, thanks for the input. So are you saying there is no way to > realistically *prevent* the timeout from occurring in the first > place? And by timing out early, do you mean to set the timeout for x > seconds and if and when the timeout occurs, handle the error and start > the process again somehow on the pending requests? Thanks. > > chad
Chad, Just run the retrieval in a Thread. If the thread is not done after x seconds, then handle it as a timeout and then retry, ignore, quit or anything else you want. Even better, what I did for my program is first gather all the URLs (I assume you can do that), then group by servers, i.e. n # of images from foo.com, m # from bar.org .... Then start a thread for each server (with some possible maximum number of threads), each one of those threads will be responsible for retrieving images from only one server (this is to prevent a DoS pattern). Let each of the server threads start a 'small' retriever thread for each image (this is to handle the timeout you mention). So you have two different threads -- one per server to parallelize downloading, which in turn will spawn and one per download to handle timeout. This way you will (ideally) saturate your bandwidth but you only get one image per server at a time so you still 'play nice' with each of the servers. If you want to have a max # of server threads running (in case you have way to many servers to deal with) then run batches of server threads. Hope this helps, Nick Vatamaniuc -- http://mail.python.org/mailman/listinfo/python-list