On 04:22 pm, m...@privacy.net wrote:
Hello,

what would be best practise for speeding up a larger number of http-get requests done via urllib? Until now they are made in sequence, each request taking up to one second. The results must be merged into a list, while the original sequence needs not to be kept.

I think speed could be improved by parallizing. One could use multiple threads. Are there any python best practises, or even existing modules, for creating and handling a task queue with a fixed number of concurrent threads?

Using multiple threads is one approach. There are a few thread pool implementations lying about; one is part of Twisted, <http://twistedmatrix.com/documents/current/api/twisted.python.threadpool.ThreadPool.html>.

Another approach is to use non-blocking or asynchronous I/O to make multiple requests without using multiple threads. Twisted can help you out with this, too. There's two async HTTP client APIs available. The older one:

http://twistedmatrix.com/documents/current/api/twisted.web.client.getPage.html
http://twistedmatrix.com/documents/current/api/twisted.web.client.HTTPClientFactory.html

And the newer one, introduced in 9.0:

http://twistedmatrix.com/documents/current/api/twisted.web.client.Agent.html

Jean-Paul
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to