Besides my PhantomJS Middleware post, I believe the DOWNLOAD_HANDLER is different from the middleware (which is why I have posted separately).
I did find this github project: flisky <https://github.com/flisky>/*scrapy-phantomjs-downloader <https://github.com/flisky/scrapy-phantomjs-downloader>* https://github.com/flisky/scrapy-phantomjs-downloader Which provides a single python file: scrapy_phantomjs/downloader/handler.py Given I am new to both Python and Scrapy I am having a very hard time understanding where to put and how to reference that file. Assuming I have the following project structure. / scrapy.cfg sapui5api/ __init__.py items.py pipelines.py settings.py spiders/ sapui5api_spiders.py 1. Which directory to put handlers.py? 2. How to reference it (what name to use)? This is what I have tried so far. I added: /sapui5api/spiders/handler.py This file has this defined: from __future__ import unicode_literals from scrapy import signals from scrapy.signalmanager import SignalManager from scrapy.responsetypes import responsetypes from scrapy.xlib.pydispatch import dispatcher from selenium import webdriver from six.moves import queue from twisted.internet import defer, threads from twisted.python.failure import Failure class PhantomJSDownloadHandler(object): In /sapui5api/settings.py I added: DOWNLOAD_HANDLERS = { 'http': 'crawler.http.PhantomJSDownloadHandler', 'https': 'crawler.https.PhantomJSDownloadHandler' } I also tried: DOWNLOAD_HANDLERS = { 'http': 'sapui5api.spiders.PhantomJSDownloadHandler', 'https': 'sapui5api.spiders.PhantomJSDownloadHandler' } Really quite guessing at this point: D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\settings\ deprecated.py:26: ScrapyDeprecationWarning: You are using the following settings which are deprecated or obsolete (ask [email protected] for alternatives): BOT_VERSION: no longer used (user agent defaults to Scrapy now) warnings.warn(msg, ScrapyDeprecationWarning) 2015-05-14 10:08:34-0400 [scrapy] INFO: Scrapy 0.24.6 started (bot: sapui5api) 2015-05-14 10:08:34-0400 [scrapy] INFO: Optional features available: ssl, http11 2015-05-14 10:08:34-0400 [scrapy] INFO: Overridden settings: { 'NEWSPIDER_MODULE': 'sapui5api.spiders', 'SPIDER_MODULES': [ 'sapui5api.spiders'], 'USER_AGENT': 'sapui5api/1.0', 'BOT_NAME': 'sapui5api' } 2015-05-14 10:08:35-0400 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState Traceback (most recent call last): File "d:\python27\scripts\scrapy-script.py", line 9, in <module> load_entry_point('scrapy==0.24.6', 'console_scripts', 'scrapy')() File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\cmdline.py", line 143, in execute _run_print_help(parser, _run_command, cmd, args, opts) File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\cmdline.py", line 89, in _run_print_help func(*a, **kw) File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\cmdline.py", line 150, in _run_command cmd.run(args, opts) File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\commands\crawl.py" , line 60, in run self.crawler_process.start() File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\crawler.py", line 92, in start if self.start_crawling(): File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\crawler.py", line 124, in start_crawling return self._start_crawler() is not None File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\crawler.py", line 139, in _start_crawler crawler.configure() File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\crawler.py", line 47, in configure self.engine = ExecutionEngine(self, self._spider_closed) File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\core\engine.py" , line 64, in __init__ self.downloader = downloader_cls(crawler) File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\core\downloader\__init__.py" , line 73, in __init__ self.handlers = DownloadHandlers(crawler) File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\core\downloader\handlers\__init__.py" , line 22, in __init__ cls = load_object(clspath) File "D:\Python27\lib\site-packages\scrapy-0.24.6-py2.7.egg\scrapy\utils\misc.py" , line 42, in load_object raise ImportError("Error loading object '%s': %s" % (path, e)) ImportError: Error loading object 'crawler.http.PhantomJSDownloadHandler': No module named crawler.http Not sure how the whole naming thing works with Python and Scrapy. How do you know in what directory to put the handler.py? The docs only talk about creating one, they never mention what directory you have to put these files or how to reference them properly after you create them. Any help is greatly appreciated. David -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
