I think the code from snippets.scrapy.org causes log messages from the deferred to appear after the spider closes and the stats are printed. I can not devote time to create steps to reproduce this right now but a delay in its deferred that causes it to finish after the engine stops should do it. Anyway, back to the topic, since all the middlewares (pipeline among them) run asynchronously, I thought process_item does not need to create a differed because it is called asynchronously already. However I tried time.sleep() in process_item() and it seems further yields from spider.parse() did block. Maybe it's a bug or maybe CONCURRENT_ITEMS refers to something else.
On Wednesday, 28 May 2014 13:34:16 UTC+3, Dimitris Kouzis - Loukas wrote: > > Basically I assume that since the whole architecture is async - you > wouldn't like to block e.g. to access a file or any socket operation. So if > someone wants to do e.g. an API lookup, I guess it's better to do it > asynchronously. For example this is the typical MySQL async example... > http://snipplr.com/view/66989/async-twisted-db-pipeline/ ... it doesn't > use Deferred but it is async. An example of pipeline with is itself the > ImagesPipeline: > https://github.com/scrapy/scrapy/blob/master/scrapy/contrib/pipeline/media.py#L38 > > On Wednesday, May 28, 2014 4:25:50 AM UTC-4, Nikolaos-Digenis Karagiannis > wrote: >> >> Why deferred? Do you want to overcome this >> http://doc.scrapy.org/en/latest/topics/settings.html#concurrent-itemsrestriction >> in a specific pipeline or while processing a specific item? >> I am asking because I inherited such a pipeline and I am still searching >> for a justification for deferring the item processing a second time. >> >> >> On Wednesday, 28 May 2014 08:44:27 UTC+3, Dimitris Kouzis - Loukas wrote: >>> >>> Hello, >>> >>> Let's assume I have a middleware e.g. a pipeline and it is async (uses >>> Deferred) and I would like to write some unit tests for that. What would >>> you suggest as a good way to organise test code and use as much as possible >>> scrapy infrastruct. Scrapy uses trial and I guess it's a good idea to >>> inherit from SiteTest e.g. as in scrapy/tests/test_command_fetch.py. Is >>> this right? >>> >>> Thanks >>> >>> -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
