Hello,
i wrote a few spiders which are working nice so far.
Now, i want to script them. I want them to start nightly and depending on
the sucess (a python function will look on the stats, the database and if
there where any exceptions to define "sucess") of the previous spider I
want to start the next one.
So i want to to do something like:
spider1.start()
if was_success(spider1):
spider2.start()
spider3.start()
send_report_by_mail(spider1, spider2, spider3)
And after all spiders are done, I want to send myself an email notifiction
about this complete run.
I looked in the different examples on how to start a spider from an script.
I am doing this in a Wrapper class of my own:
self.crawler = Crawler(self.settings)
spider = self.spiderclass(**self.options)
self.crawler.signals.connect(self.crawl_stoped,
signal=signals.spider_closed)
self.crawler.signals.connect(self.engine_closed,
signal=signals.engine_stopped)
self.crawler.signals.connect(self.crawl_error,
signal=signals.spider_error)
self.crawler.configure()
self.crawler.crawl(spider)
self.crawler.start()
if not reactor.running:
reactor.run() # the script will block here until the
spider_closed signal was sent
Now my problem, for which i can not find a solution (i tried to look in the
scrapy source, but i can not figure it out, I have only basic twisted
knowledge):
In some cases, my spiders will raise an Exception in "spider.closed()". I
don't really now why, but it looks like, that in these cases neither of the
three signals (spider_closed, engine_closed or spider_error) is called,
leaving me with no way, to handle this error and with a "hanging" reactor
and a never ending script.
How can i solve this?
Thanks,
Daniel
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.