Re: Do pipelines block Scrapy from crawling?

Lee H. Sat, 29 Aug 2015 21:00:46 -0700

OK, I see now that if I didn't use Twisted adbapi and blocking occured in 
the pipeline (e.g. if I artificially just added a `time.sleep(100)` the 
*whole* of Scrapy stops until the 100s is over. Since after all Scrapy is 
single-threaded just asynchronous, so if the pipeline blocks like this 
everything blocks. Whereas if I use twisted's adbapi and add an artificial 
block like this, then twisted just moves on to a non-blocking task (like 
Scrapy a Scrapy Request or something) and the spider can march onward.


I'm still curious though. If I had a really slow db and used adbapi, what 
would happen? In my experiments it seems simply that all items just pileup 
at the end, and carry on getting written to db (with the total extra 
writing time -- perhaps from my artificial delays-- getting added on to 
scrape time without the pipeline). Are there any other concerns?

Particularly I'm worried about:

1) Does scrapy autothrottle the crawler if too many items pileup, and if so 
is that a concern anyway?
2) Could this lead to memory issues? (is it just items that would pileup or 
would Requests/Responses end up hanging around too?)

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Do pipelines block Scrapy from crawling?

Reply via email to