hi,
we have play scrapy several months, and tackle on JS issue recently. NOW
the problem encountered is "callback never be called once we add download
middleware":
our source code follow as:
from scrapy.spiders import CrawlSpider
> from scrapy.http import HtmlResponse
> #our_spider.py
> class OurSpider(CrawlSpider):
> name = "our_spider"
> ....
> def parse_item(self, response):
> log.info('inside parse item: ####')
> pdb.set_trace()
> pass
> # middleware.py
> class PhantomJSMiddleware(object):
> def process_request(self, request, spider):
> driver =
> webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs',service_args=['--ssl-protocol=any',
>
> '--web-security=no'])
> driver.set_window_size(1120, 550)
> driver.get(request.url)
> content = driver.page_source.encode('utf-8')
> url = driver.current_url.encode('utf-8')
> driver.quit()
> return HtmlResponse(url, encoding='utf-8', status=200,
> body=content, request=request)
> # settings.py
> DOWNLOADER_MIDDLEWARES = {
> 'my_project.middlewares.PhantomJSMiddleware': 600,
> }
if we remove download middleware from settings.py, the callback parse_item
would be called. BUT once we turn on the middleware in settings.py, the
callback parse_item never be called.
we have check the `response.request.callback` at the end of
process_request, it is here. SO we really suck today.
very appreciated if any suggestion or tips.
wenlong
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.