Thanks for the response José. That integrates Splash as the JS renderer. From the documentation I have read, it looks like Splash does not support Windows.
David On Thursday, May 14, 2015 at 12:24:08 AM UTC-4, José Ricardo wrote: > > Hi David, have you given ScrapyJS a try? > > https://github.com/scrapinghub/scrapyjs > > Besides rendering the page, it can also take screenshots :) > > Regards, > > José > > On Wed, May 13, 2015 at 3:54 PM, Travis Leleu <[email protected] > <javascript:>> wrote: > >> Hi David, >> >> Honestly, I have yet to find a good integration with scrapy / JS >> browser. The current methods seem to all download the basic page via >> urllib3, then send that HTML to render and fetch other resources. >> >> This causes a bottleneck -- the browser process, usually exposed via an >> API, takes a lot of CPU / time to render the page. It also doesn't easily >> use proxies, which means that all subsequent requests will be from one IP >> address. >> >> I think it would be a lot of work to build this into scrapy. >> >> In my work, I tend to just write my own (scaled down) scraping engine >> that works more directly with a headless js browser. >> >> On Wed, May 13, 2015 at 12:32 PM, David Fishburn <[email protected] >> <javascript:>> wrote: >> >>> I am new to Scrapy and Python. >>> >>> I have a site I need to scrap but it is all AJAX driven, so will need >>> something like PhantomJS to yield the final page rendering. >>> >>> I have been searching in vain really for a simple example of a >>> downloader middleware which uses PhantomJS. It has been around long enough >>> that I am sure someone has already written one. I can find complete >>> projects for Splash and others, but I am on Windows. >>> >>> It doesn't need to be fancy, just take the Scrapy request and return the >>> PhantomJS page (most likely using the WaitFor.js, which the PhantomJS dev >>> team wrote, to only return the page after it has stopped making AJAX calls). >>> >>> I am completely lost trying to get started. The documentation ( >>> http://doc.scrapy.org/en/latest/topics/downloader-middleware.html) >>> talks about the APIs, but they don't give a basic application which I could >>> begin modifying to plugin the PhantomJS calls which I have shown below >>> (which are very simple). >>> >>> Anyone have something I can use? >>> >>> This code does what I want when using the Scrapy shell: >>> >>> >>> D:\Python27\Scripts\scrapy.exe shell >>> https://sapui5.netweaver.ondemand.com/sdk/#docs/api/symbols/sap.html >>> >>> >>>from selenium import webdriver >>> >>>driver = webdriver.PhantomJS() >>> >>>driver.set_window_size(1024, 768) >>> >>>driver.get(' >>> https://sapui5.netweaver.ondemand.com/sdk/#docs/api/symbols/sap.html') >>> -- Wait here for a 30 seconds and let the AJAX calls finish >>> >>>driver.save_screenshot('screen.png') >>> >>>print driver.page_source >>> >>>driver.quit() >>> >>> >>> The screen shot contains a properly rendered browser. >>> >>> >>> Thanks for any advice you can give. >>> David >>> >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "scrapy-users" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected] <javascript:>. >>> To post to this group, send email to [email protected] >>> <javascript:>. >>> Visit this group at http://groups.google.com/group/scrapy-users. >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "scrapy-users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] <javascript:>. >> To post to this group, send email to [email protected] >> <javascript:>. >> Visit this group at http://groups.google.com/group/scrapy-users. >> For more options, visit https://groups.google.com/d/optout. >> > > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
