Hi there,

I'm developing a distributed crawler using Scrapy and Twisted. There's a 
server that assigns crawling jobs to clients (*so clients create scrapy 
spiders*) and so on. The clients are twisted *LineReceivers*. I have scrapy 
0.24.4 and twisted 14.0.2.

I'm stuck with this for a couple of days now. The spider works fine when 
run alone outside of the twisted client. When it's run from the client 
something strange happens, it's never closed and stays idle forever. If I 
look at the logs, I should say that the spider was closed : 

2014-11-27 14:55:15+0200 [WLWClientProtocol,client] WebService starting on 
6080
2014-11-27 14:55:15+0200 [scrapy] Web service listening on 127.0.0.1:6080
2014-11-27 14:55:15+0200 [scrapy] Closing spider (finished)
2014-11-27 14:55:15+0200 [scrapy] Dumping Scrapy stats:
 {'finish_reason': 'finished',
 'finish_time': datetime.datetime(2014, 11, 27, 12, 55, 15, 240062),
 'start_time': datetime.datetime(2014, 11, 27, 12, 55, 15, 238374)}
2014-11-27 14:55:15+0200 [scrapy] Spider closed (finished)
2014-11-27 14:55:15+0200 [-] (TCP Port 6023 Closed)
2014-11-27 14:55:15+0200 [-] (TCP Port 6080 Closed)

But the *spider_closed *signal is never emitted (again, this spider works 
fine outside the client, so the signal is properly connected). And I depend 
on this signal for sending results back to server, not to mention that the 
spider stays open, which counts as a leak.

Using the debugger reveals some facts:
>From *ExecutionEngine.spider_is_idle() *method:
   a- The scraper is always idle (*scraper_idle *is always True) and the 
spider's *parse()* method is never called.
   b- *downloading *is always True. And the 
*Downloader.fetch()._deactivate()* is never called.

Is there any hints at what I should be doing?. Debugging deferred code is 
not that easy, and stacks come out of nowhere.

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to