Scrapyd sounds better to me.

El lunes, 25 de agosto de 2014 01:42:42 UTC-3, Steven Adams escribió:
>
> Update,
>
> I've managed to launch a scrapy job from Django. In case anyone else is 
> trying to get it work the code is below.
>
>         spider = MySpider()
>         settings = get_project_settings()
>         crawler = Crawler(settings)
>         crawler.signals.connect(add_item, signal=signals.item_passed)
>         crawler.signals.connect(reactor.stop, signal=signals.spider_closed)
>         crawler.configure()
>         crawler.crawl(spider)
>         crawler.start()
>         log.start(loglevel=log.DEBUG)
>         reactor.run(installSignalHandlers=0)
>         logger.debug(items)
>
> I'm getting random errors:
>   File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", 
> line 1191, in run
>     self.startRunning(installSignalHandlers=installSignalHandlers)
>   File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", 
> line 1171, in startRunning
>     ReactorBase.startRunning(self)
>   File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", 
> line 683, in startRunning
>     raise error.ReactorNotRestartable()
> ReactorNotRestartable
>
> Anyone have any ideas what might be the issue here?
>
> On another note: I was thinking about using Scrapyd running as a service. 
> Django calls the spider with certain arguments, collect the job id then 
> checks every second for the job completion. I can get the spider to store 
> the results within a json file that djano can pick up.
>
> So my question is, do you guys think its a better design decision to use 
> scrapyd or run directly from django its self?
>
> Thanks!
>
> On Saturday, August 23, 2014 11:37:42 PM UTC+10, Steven Adams wrote:
>>
>> Hi All,
>>
>> I'm trying to figure out how to run a spider directly from django..
>>
>> I've done alot of searching around but no matter how i try i keep getting 
>>
>>           File 
>> "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", line 
>> 1152, in _handleSignals
>>             signal.signal(signal.SIGINT, self.sigInt)
>>         exceptions.ValueError: signal only works in main thread
>>
>> Does anyone have any idea how i can achieve this?? Please see the code 
>> i'm using to launch scrapy within django below.
>>
>> ===============
>> class CrawlerWorker(multiprocessing.Process):
>>
>>     def __init__(self, spider, result_queue):
>>         multiprocessing.Process.__init__(self)
>>         self.result_queue = result_queue
>>
>>         self.crawler = CrawlerProcess(settings)
>>         if not hasattr(project, 'crawler'):
>>             self.crawler.install()
>>         self.crawler.configure()
>>
>>         self.items = []
>>         self.spider = spider
>>         dispatcher.connect(self._item_passed, signals.item_passed)
>>
>>     def _item_passed(self, item):
>>         self.items.append(item)
>>
>>     def run(self):
>>         self.crawler.crawl(self.spider)
>>         self.crawler.start()
>>         self.crawler.stop()
>>         self.result_queue.put(self.items)
>>
>>
>> result_queue = Queue()
>> crawler = CrawlerWorker(MyCrawler(), result_queue)
>> crawler.start()
>> for item in result_queue.get():
>>     print item
>> ========
>>
>> Thanks
>> Steve
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to