keeping spiders open

user12345 Sat, 10 Jan 2015 17:44:30 -0800

I'm planning to have daemon CrawlWorker (subclassing 
multiprocessing.Process) that monitors a queue for scrape requests.


The responsibility of this worker is to take scrape requests from the queue 
and feed them to spiders. In order to avoid implementing batching logic 
(like wait for N requests before creating a new spider), would it make 
sense to keep all my spiders alive, and then *add* more scrape requests to 
each spider when they're idle, and if there are no more scrape requests, 
keep them open? 

What would be the best, simplest, and most elegant way to implement this? 
It seems that given that attributes `start_urls`, that a spider is meant to 
be instantiated with an initial work list, do its work, then die.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

keeping spiders open

Reply via email to