rzimerman wrote:
> I'm hoping to write a program that will read any number of urls from
> stdin (1 per line), download them, and process them. So far my script
> (below) works well for small numbers of urls. However, it does not
> scale to more than 200 urls or so, because it issues HTTP requests for
> all of the urls simultaneously, and terminates after 25 seconds.
> Ideally, I'd like this script to download at most 50 pages in parallel,
> and to time out if and only if any HTTP request is not answered in 3
> seconds. What changes do I need to make?
> 
> Is Twisted the best library for me to be using? I do like Twisted, but
> it seems more suited to batch mode operations. Is there some way that I
> could continue registering url requests while the reactor is running?
> Is there a way to specify a time out per page request, rather than for
> a batch of pages requests?

Have a look at pyCurl. (http://pycurl.sourceforge.net)

Regards
Sreeram


Attachment: signature.asc
Description: OpenPGP digital signature

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to