rzimerman wrote: > I'm hoping to write a program that will read any number of urls from > stdin (1 per line), download them, and process them. So far my script > (below) works well for small numbers of urls. However, it does not > scale to more than 200 urls or so, because it issues HTTP requests for > all of the urls simultaneously, and terminates after 25 seconds. > Ideally, I'd like this script to download at most 50 pages in parallel, > and to time out if and only if any HTTP request is not answered in 3 > seconds. What changes do I need to make? > > Is Twisted the best library for me to be using? I do like Twisted, but > it seems more suited to batch mode operations. Is there some way that I > could continue registering url requests while the reactor is running? > Is there a way to specify a time out per page request, rather than for > a batch of pages requests?
Have a look at pyCurl. (http://pycurl.sourceforge.net) Regards Sreeram
signature.asc
Description: OpenPGP digital signature
-- http://mail.python.org/mailman/listinfo/python-list