Can you share the code?

El viernes, 28 de noviembre de 2014 20:11:00 UTC-2, Mohammed Hamdy escribió:
>
> I tried launching the spider in another process. It's now worse and 
> doesn't even log that it's finished.
>
> On Friday, November 28, 2014 12:06:08 PM UTC+2, Mohammed Hamdy wrote:
>>
>> Hi there,
>>
>> I'm developing a distributed crawler using Scrapy and Twisted. There's a 
>> server that assigns crawling jobs to clients (*so clients create scrapy 
>> spiders*) and so on. The clients are twisted *LineReceivers*. I have 
>> scrapy 0.24.4 and twisted 14.0.2.
>>
>> I'm stuck with this for a couple of days now. The spider works fine when 
>> run alone outside of the twisted client. When it's run from the client 
>> something strange happens, it's never closed and stays idle forever. If I 
>> look at the logs, I should say that the spider was closed : 
>>
>> 2014-11-27 14:55:15+0200 [WLWClientProtocol,client] WebService starting 
>> on 6080
>> 2014-11-27 14:55:15+0200 [scrapy] Web service listening on 127.0.0.1:6080
>> 2014-11-27 14:55:15+0200 [scrapy] Closing spider (finished)
>> 2014-11-27 14:55:15+0200 [scrapy] Dumping Scrapy stats:
>>  {'finish_reason': 'finished',
>>  'finish_time': datetime.datetime(2014, 11, 27, 12, 55, 15, 240062),
>>  'start_time': datetime.datetime(2014, 11, 27, 12, 55, 15, 238374)}
>> 2014-11-27 14:55:15+0200 [scrapy] Spider closed (finished)
>> 2014-11-27 14:55:15+0200 [-] (TCP Port 6023 Closed)
>> 2014-11-27 14:55:15+0200 [-] (TCP Port 6080 Closed)
>>
>> But the *spider_closed *signal is never emitted (again, this spider 
>> works fine outside the client, so the signal is properly connected). And I 
>> depend on this signal for sending results back to server, not to mention 
>> that the spider stays open, which counts as a leak.
>>
>> Using the debugger reveals some facts:
>> From *ExecutionEngine.spider_is_idle() *method:
>>    a- The scraper is always idle (*scraper_idle *is always True) and the 
>> spider's *parse()* method is never called.
>>    b- *downloading *is always True. And the 
>> *Downloader.fetch()._deactivate()* is never called.
>>
>> Is there any hints at what I should be doing?. Debugging deferred code is 
>> not that easy, and stacks come out of nowhere.
>>
>> Thanks
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to