On Thu, May 10, 2012 at 8:14 AM, Jabba Laci <jabba.l...@gmail.com> wrote:
> What's the best way?

>From what I've heard, http://scrapy.org/ . It is a single-thread
single-process web crawler that nonetheless can download things
concurrently.

Doing what you want in Scrapy would probably involve learning about
Twisted, the library Scrapy works on top of. This is somewhat more
involved than just throwing threads and urllib and lxml.html together,
although most of the Twisted developers are really helpful. It might
not be worth it to you, depending on the size of the task.



Dave's answer is pretty general and good though.

-- Devin
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to