I'm planning to have daemon CrawlWorker (subclassing multiprocessing.Process) that monitors a queue for scrape requests.
The responsibility of this worker is to take scrape requests from the queue and feed them to spiders. In order to avoid implementing batching logic (like wait for N requests before creating a new spider), would it make sense to keep all my spiders alive, and then *add* more scrape requests to each spider when they're idle, and if there are no more scrape requests, keep them open? What would be the best, simplest, and most elegant way to implement this? It seems that given that attributes `start_urls`, that a spider is meant to be instantiated with an initial work list, do its work, then die. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
