Hi all,
I've been trying to use the script shown under
http://blog.tryolabs.com/2011/09/27/calling-scrapy-python-script/ in order
to run my Scrapy spider as a separate process from within my Python script
and retrieve the items all within the code while avoiding the whole "write
file, read file" process.
I'm pasting the code found in the link above here:
from scrapy import project, signals
from scrapy.conf import settings
from scrapy.crawler import CrawlerProcess
from scrapy.xlib.pydispatch import dispatcher
from multiprocessing.queues import Queue
import multiprocessing
class CrawlerWorker(multiprocessing.Process):
def __init__(self, spider, result_queue):
multiprocessing.Process.__init__(self)
self.result_queue = result_queue
self.crawler = CrawlerProcess(settings)
if not hasattr(project, 'crawler'):
self.crawler.install()
self.crawler.configure()
self.items = []
self.spider = spider
dispatcher.connect(self._item_passed, signals.item_passed)
def _item_passed(self, item):
self.items.append(item)
def run(self):
self.crawler.crawl(self.spider)
self.crawler.start()
self.crawler.stop()
self.result_queue.put(self.items)
which, per the article in the aforementioned link, is supposed to be run as
such:
result_queue = Queue()
crawler = CrawlerWorker(MySpider(myArgs), result_queue)
crawler.start()
for item in result_queue.get():
yield item
However, the above code was written for Scrapy v0.13 and I'm currently
using v0.24.6. The problem is that in the current version the
'scrapy.crawler.CrawlerProcess'
class used above no longer has methods like 'install', 'configure', or
'crawl' thus making the code unusable. Would anyone know what the
corresponding functionality in the newer version of Scrapy is? How can I
modify the above code to make it work with recent versions of Scrapy?
Any help is appreciated :D
Cheers!
Adam
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.