Late reply... but your selector for names in hxs which is parsing the original response, not the response passed back by selenium.
On Thursday, July 26, 2012 11:45:56 AM UTC-7, kchen wrote: > > So these past few days I've been reading about using Scrapy and Selenium > in order to scrape pages that have data that can only be seen by scrapy > once the javascript has loaded. Selenium seems to be a plausible solution > because it will emulate the functions of a browser. In my code, I have > followed the outline http://snippets.scrapy.org/snippets/21/ on how to > implement Selenium in scrapy. However, when I do in fact go and run my > spider, my spider can't seem to find the xpath that I am looking for. I > feel that this is the case because the HtmlXPathSelector is still pointing > towards the page without the javascript and not acting on the > Selenium-created browser. However, I am not sure. Does anyone have a > solution? > > Here is the code for reference: > > def __init__(self): > CrawlSpider.__init__(self) > self.verificationErrors = [] > self.selenium = selenium("localhost", 4444, "*firefox", " > http://www.example.com") > self.selenium.start() > > def __del__(self): > self.selenium.quit() > print self.verificationErrors > CrawlSpider.__del__(self) > > > def parse(self,response): > return CrawlSpider.parse(self, response) > > def parse(self,response): > hxs = HtmlXPathSelector(response) > sel = self.selenium > sel.get(response.url) > names = hxs('//div[@id="divThumbViewContents"]') > for name in names: > item = theItem() > item['A'] = "T-Mobile" > item['B'] = "" > item['C'] = > name.find_element_by_xpath('//div[@id="Name"]/text()') > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
