Yes, Scrapy does stop the crawler if too many requests are being processed including pipelines. I have had it happen once: a service that a pipeline depended on stopped, and so did the entire crawler, and as soon as the service was up again, crawling resumed.
The code responsible for it is in scraper.py <https://github.com/scrapy/scrapy/blob/master/scrapy/core/scraper.py>. Requests to be processed will be queued until the slot is at max_active_size. A request is only dequeued after the callback has been called and its output has been processed, including through pipelines. Memory usage will not grow forever because of max_active_size. Em domingo, 30 de agosto de 2015 01:00:17 UTC-3, Lee H. escreveu: > > OK, I see now that if I didn't use Twisted adbapi and blocking occured in > the pipeline (e.g. if I artificially just added a `time.sleep(100)` the > *whole* of Scrapy stops until the 100s is over. Since after all Scrapy is > single-threaded just asynchronous, so if the pipeline blocks like this > everything blocks. Whereas if I use twisted's adbapi and add an artificial > block like this, then twisted just moves on to a non-blocking task (like > Scrapy a Scrapy Request or something) and the spider can march onward. > > I'm still curious though. If I had a really slow db and used adbapi, what > would happen? In my experiments it seems simply that all items just pileup > at the end, and carry on getting written to db (with the total extra > writing time -- perhaps from my artificial delays-- getting added on to > scrape time without the pipeline). Are there any other concerns? > > Particularly I'm worried about: > > 1) Does scrapy autothrottle the crawler if too many items pileup, and if > so is that a concern anyway? > 2) Could this lead to memory issues? (is it just items that would pileup > or would Requests/Responses end up hanging around too?) > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
