Re: Scrapy and Selenium Issues

Casey Petrus Sat, 07 Jun 2014 10:32:27 -0700

Late reply... but your selector for names in hxs which is parsing the 
original response, not the response passed back by selenium.


On Thursday, July 26, 2012 11:45:56 AM UTC-7, kchen wrote:
>
> So these past few days I've been reading about using Scrapy and Selenium 
> in order to scrape pages that have data that can only be seen by scrapy 
> once the javascript has loaded. Selenium seems to be a plausible solution 
> because it will emulate the functions of a browser. In my code, I have 
> followed the outline  http://snippets.scrapy.org/snippets/21/ on how to 
> implement Selenium in scrapy. However, when I do in fact go and run my 
> spider, my spider can't seem to find the xpath that I am looking for. I 
> feel that this is the case because the HtmlXPathSelector is still pointing 
> towards the page without the javascript and not acting on the 
> Selenium-created browser. However, I am not sure. Does anyone have a 
> solution?
>
> Here is the code for reference:
>
>  def __init__(self):
> CrawlSpider.__init__(self)
> self.verificationErrors = []
> self.selenium = selenium("localhost", 4444, "*firefox", "
> http://www.example.com";)
> self.selenium.start()
>
>     def __del__(self):
> self.selenium.quit()
> print self.verificationErrors
>         CrawlSpider.__del__(self)
>
>
>     def parse(self,response):
>         return CrawlSpider.parse(self, response)
>     
>     def parse(self,response):
>         hxs = HtmlXPathSelector(response)
>         sel = self.selenium
>         sel.get(response.url)
>         names = hxs('//div[@id="divThumbViewContents"]')
>         for name in names:
>             item = theItem()
>             item['A'] = "T-Mobile"
>             item['B'] = ""
>             item['C'] = 
> name.find_element_by_xpath('//div[@id="Name"]/text()')
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Scrapy and Selenium Issues

Reply via email to