Re: Designing multiple spiders orchestration

Bill Ebeling Fri, 11 Apr 2014 04:46:02 -0700

If I was tasked with writing spiders that scraped based on other spiders 
activity, I would let one spider run fully, persist the data, then read the 
data into the next spider.


If it was for some reason critical that the item is processed immediately, 
then I would write one spider, allowing all relevant domains, and use logic 
to route the requests..  maybe have a bunch of methods that scrape sites 
and they call a router method as callback.  That router method investigates 
the item and calls the next required scraping method. When the item isn't 
routed anywhere it finally gets sent to the pipeline.

Nice and simple.

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Designing multiple spiders orchestration

Reply via email to