Hi, I want to extract final page data like titlename and description. I made a spider code for cragslist website. every page has 100 links and i got nexpage url with all links but i want to go every link and scrap the data from final page. can any one help me to sort out my problem my spider code is:
from scrapy.contrib.spiders import CrawlSpider, Rule from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor from scrapy.selector import HtmlXPathSelector from scrapy.http import Request from pagination.items import PaginationItem class InfojobsSpider(CrawlSpider): name = "check" allowed_domains = ["craigslist.org"] start_urls = [ "http://sfbay.craigslist.org/npo/" ] rules = ( Rule(SgmlLinkExtractor(allow=(r'index.*?html'),restrict_xpaths=('//a[@class="button next"]')), callback='parse_item', follow=True), ) fname = 1 def parse_start_url(self, response): return self.parse_item(response) def parse_item(self, response): hxs = HtmlXPathSelector(response) titles = hxs.select('//span[@class="pl"]') for title in titles: title= title.select('a/@href').extract() NOTE: in title there is the link url comming now i want to go every url one by one and extract data from finalpage. Please help me. -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
