PhantomJS Downloader Middleware

David Fishburn Wed, 13 May 2015 12:33:13 -0700

I am new to Scrapy and Python.

I have a site I need to scrap but it is all AJAX driven, so will need 
something like PhantomJS to yield the final page rendering.


I have been searching in vain really for a simple example of a downloader 
middleware which uses PhantomJS.  It has been around long enough that I am 
sure someone has already written one.  I can find complete projects for 
Splash and others, but I am on Windows.

It doesn't need to be fancy, just take the Scrapy request and return the 
PhantomJS page (most likely using the WaitFor.js, which the PhantomJS dev 
team wrote, to only return the page after it has stopped making AJAX calls).

I am completely lost trying to get started.  The documentation 
(http://doc.scrapy.org/en/latest/topics/downloader-middleware.html) talks 
about the APIs, but they don't give a basic application which I could begin 
modifying to plugin the PhantomJS calls which I have shown below (which are 
very simple).

Anyone have something I can use?

This code does what I want when using the Scrapy shell:


D:\Python27\Scripts\scrapy.exe shell 
https://sapui5.netweaver.ondemand.com/sdk/#docs/api/symbols/sap.html

>>>from selenium import webdriver
>>>driver = webdriver.PhantomJS()
>>>driver.set_window_size(1024, 768)
>>>driver.get('https://sapui5.netweaver.ondemand.com/sdk/#docs/api/symbols/sap.html')
-- Wait here for a 30 seconds and let the AJAX calls finish
>>>driver.save_screenshot('screen.png')
>>>print driver.page_source
>>>driver.quit()


The screen shot contains a properly rendered browser.


Thanks for any advice you can give.
David



-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

PhantomJS Downloader Middleware

Reply via email to