OK, I see now that if I didn't use Twisted adbapi and blocking occured in the pipeline (e.g. if I artificially just added a `time.sleep(100)` the *whole* of Scrapy stops until the 100s is over. Since after all Scrapy is single-threaded just asynchronous, so if the pipeline blocks like this everything blocks. Whereas if I use twisted's adbapi and add an artificial block like this, then twisted just moves on to a non-blocking task (like Scrapy a Scrapy Request or something) and the spider can march onward.
I'm still curious though. If I had a really slow db and used adbapi, what would happen? In my experiments it seems simply that all items just pileup at the end, and carry on getting written to db (with the total extra writing time -- perhaps from my artificial delays-- getting added on to scrape time without the pipeline). Are there any other concerns? Particularly I'm worried about: 1) Does scrapy autothrottle the crawler if too many items pileup, and if so is that a concern anyway? 2) Could this lead to memory issues? (is it just items that would pileup or would Requests/Responses end up hanging around too?) -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
