I am new to Scrapy and Python. I have a site I need to scrap but it is all AJAX driven, so will need something like PhantomJS to yield the final page rendering.
I have been searching in vain really for a simple example of a downloader middleware which uses PhantomJS. It has been around long enough that I am sure someone has already written one. I can find complete projects for Splash and others, but I am on Windows. It doesn't need to be fancy, just take the Scrapy request and return the PhantomJS page (most likely using the WaitFor.js, which the PhantomJS dev team wrote, to only return the page after it has stopped making AJAX calls). I am completely lost trying to get started. The documentation (http://doc.scrapy.org/en/latest/topics/downloader-middleware.html) talks about the APIs, but they don't give a basic application which I could begin modifying to plugin the PhantomJS calls which I have shown below (which are very simple). Anyone have something I can use? This code does what I want when using the Scrapy shell: D:\Python27\Scripts\scrapy.exe shell https://sapui5.netweaver.ondemand.com/sdk/#docs/api/symbols/sap.html >>>from selenium import webdriver >>>driver = webdriver.PhantomJS() >>>driver.set_window_size(1024, 768) >>>driver.get('https://sapui5.netweaver.ondemand.com/sdk/#docs/api/symbols/sap.html') -- Wait here for a 30 seconds and let the AJAX calls finish >>>driver.save_screenshot('screen.png') >>>print driver.page_source >>>driver.quit() The screen shot contains a properly rendered browser. Thanks for any advice you can give. David -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
