Yes, Scrapy does stop the crawler if too many requests are being processed 
including pipelines. I have had it happen once: a service that a pipeline 
depended on stopped, and so did the entire crawler, and as soon as the 
service was up again, crawling resumed. 

The code responsible for it is in scraper.py 
<https://github.com/scrapy/scrapy/blob/master/scrapy/core/scraper.py>. 
Requests to be processed will be queued until the slot is at 
max_active_size. A request is only dequeued after the callback has been 
called and its output has been processed, including through pipelines. 
Memory usage will not grow forever because of max_active_size.


Em domingo, 30 de agosto de 2015 01:00:17 UTC-3, Lee H. escreveu:
>
> OK, I see now that if I didn't use Twisted adbapi and blocking occured in 
> the pipeline (e.g. if I artificially just added a `time.sleep(100)` the 
> *whole* of Scrapy stops until the 100s is over. Since after all Scrapy is 
> single-threaded just asynchronous, so if the pipeline blocks like this 
> everything blocks. Whereas if I use twisted's adbapi and add an artificial 
> block like this, then twisted just moves on to a non-blocking task (like 
> Scrapy a Scrapy Request or something) and the spider can march onward.
>
> I'm still curious though. If I had a really slow db and used adbapi, what 
> would happen? In my experiments it seems simply that all items just pileup 
> at the end, and carry on getting written to db (with the total extra 
> writing time -- perhaps from my artificial delays-- getting added on to 
> scrape time without the pipeline). Are there any other concerns?
>
> Particularly I'm worried about:
>
> 1) Does scrapy autothrottle the crawler if too many items pileup, and if 
> so is that a concern anyway?
> 2) Could this lead to memory issues? (is it just items that would pileup 
> or would Requests/Responses end up hanging around too?)
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to