Scrapyd sounds better to me. El lunes, 25 de agosto de 2014 01:42:42 UTC-3, Steven Adams escribió: > > Update, > > I've managed to launch a scrapy job from Django. In case anyone else is > trying to get it work the code is below. > > spider = MySpider() > settings = get_project_settings() > crawler = Crawler(settings) > crawler.signals.connect(add_item, signal=signals.item_passed) > crawler.signals.connect(reactor.stop, signal=signals.spider_closed) > crawler.configure() > crawler.crawl(spider) > crawler.start() > log.start(loglevel=log.DEBUG) > reactor.run(installSignalHandlers=0) > logger.debug(items) > > I'm getting random errors: > File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", > line 1191, in run > self.startRunning(installSignalHandlers=installSignalHandlers) > File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", > line 1171, in startRunning > ReactorBase.startRunning(self) > File "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", > line 683, in startRunning > raise error.ReactorNotRestartable() > ReactorNotRestartable > > Anyone have any ideas what might be the issue here? > > On another note: I was thinking about using Scrapyd running as a service. > Django calls the spider with certain arguments, collect the job id then > checks every second for the job completion. I can get the spider to store > the results within a json file that djano can pick up. > > So my question is, do you guys think its a better design decision to use > scrapyd or run directly from django its self? > > Thanks! > > On Saturday, August 23, 2014 11:37:42 PM UTC+10, Steven Adams wrote: >> >> Hi All, >> >> I'm trying to figure out how to run a spider directly from django.. >> >> I've done alot of searching around but no matter how i try i keep getting >> >> File >> "/usr/local/lib/python2.7/site-packages/twisted/internet/base.py", line >> 1152, in _handleSignals >> signal.signal(signal.SIGINT, self.sigInt) >> exceptions.ValueError: signal only works in main thread >> >> Does anyone have any idea how i can achieve this?? Please see the code >> i'm using to launch scrapy within django below. >> >> =============== >> class CrawlerWorker(multiprocessing.Process): >> >> def __init__(self, spider, result_queue): >> multiprocessing.Process.__init__(self) >> self.result_queue = result_queue >> >> self.crawler = CrawlerProcess(settings) >> if not hasattr(project, 'crawler'): >> self.crawler.install() >> self.crawler.configure() >> >> self.items = [] >> self.spider = spider >> dispatcher.connect(self._item_passed, signals.item_passed) >> >> def _item_passed(self, item): >> self.items.append(item) >> >> def run(self): >> self.crawler.crawl(self.spider) >> self.crawler.start() >> self.crawler.stop() >> self.result_queue.put(self.items) >> >> >> result_queue = Queue() >> crawler = CrawlerWorker(MyCrawler(), result_queue) >> crawler.start() >> for item in result_queue.get(): >> print item >> ======== >> >> Thanks >> Steve >> >
-- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
