Thanks for the response José.  

That integrates Splash as the JS renderer.  From the documentation I have 
read, it looks like Splash does not support Windows.

David


On Thursday, May 14, 2015 at 12:24:08 AM UTC-4, José Ricardo wrote:
>
> Hi David, have you given ScrapyJS a try?
>
> https://github.com/scrapinghub/scrapyjs
>
> Besides rendering the page, it can also take screenshots :)
>
> Regards,
>
> José
>
> On Wed, May 13, 2015 at 3:54 PM, Travis Leleu <[email protected] 
> <javascript:>> wrote:
>
>> Hi David,
>>
>> Honestly, I have yet to find a good integration with scrapy / JS 
>> browser.  The current methods seem to all download the basic page via 
>> urllib3, then send that HTML to render and fetch other resources.
>>
>> This causes a bottleneck -- the browser process, usually exposed via an 
>> API, takes a lot of CPU / time to render the page.  It also doesn't easily 
>> use proxies, which means that all subsequent requests will be from one IP 
>> address.
>>
>> I think it would be a lot of work to build this into scrapy.
>>
>> In my work, I tend to just write my own (scaled down) scraping engine 
>> that works more directly with a headless js browser.
>>
>> On Wed, May 13, 2015 at 12:32 PM, David Fishburn <[email protected] 
>> <javascript:>> wrote:
>>
>>> I am new to Scrapy and Python.
>>>
>>> I have a site I need to scrap but it is all AJAX driven, so will need 
>>> something like PhantomJS to yield the final page rendering.
>>>
>>> I have been searching in vain really for a simple example of a 
>>> downloader middleware which uses PhantomJS.  It has been around long enough 
>>> that I am sure someone has already written one.  I can find complete 
>>> projects for Splash and others, but I am on Windows.
>>>
>>> It doesn't need to be fancy, just take the Scrapy request and return the 
>>> PhantomJS page (most likely using the WaitFor.js, which the PhantomJS dev 
>>> team wrote, to only return the page after it has stopped making AJAX calls).
>>>
>>> I am completely lost trying to get started.  The documentation (
>>> http://doc.scrapy.org/en/latest/topics/downloader-middleware.html) 
>>> talks about the APIs, but they don't give a basic application which I could 
>>> begin modifying to plugin the PhantomJS calls which I have shown below 
>>> (which are very simple).
>>>
>>> Anyone have something I can use?
>>>
>>> This code does what I want when using the Scrapy shell:
>>>
>>>
>>> D:\Python27\Scripts\scrapy.exe shell 
>>> https://sapui5.netweaver.ondemand.com/sdk/#docs/api/symbols/sap.html
>>>
>>> >>>from selenium import webdriver
>>> >>>driver = webdriver.PhantomJS()
>>> >>>driver.set_window_size(1024, 768)
>>> >>>driver.get('
>>> https://sapui5.netweaver.ondemand.com/sdk/#docs/api/symbols/sap.html')
>>> -- Wait here for a 30 seconds and let the AJAX calls finish
>>> >>>driver.save_screenshot('screen.png')
>>> >>>print driver.page_source
>>> >>>driver.quit()
>>>
>>>
>>> The screen shot contains a properly rendered browser.
>>>
>>>
>>> Thanks for any advice you can give.
>>> David
>>>
>>>
>>>
>>>  -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "scrapy-users" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> To post to this group, send email to [email protected] 
>>> <javascript:>.
>>> Visit this group at http://groups.google.com/group/scrapy-users.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "scrapy-users" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> To post to this group, send email to [email protected] 
>> <javascript:>.
>> Visit this group at http://groups.google.com/group/scrapy-users.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to