> > Hey,
My task is to take stock updates from one of my suppliers : www.sportsshoes.com . The issue I am facing is that despite the Crawl Spider visiting each page of a category page it only gives returns data from the first page. This also is the case if I try and scrape each page independently i.e even if I assign it to scrape the third page of the category it only returns results from the first page. My Code: from scrapy.contrib.spiders import CrawlSpider, Rule > from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor > from scrapy.selector import HtmlXPathSelector > from scrapy.spider import BaseSpider > from sportshoes.items import SportshoesItem > import urlparse > from scrapy.http.request import Request > > class MySpider(CrawlSpider): > name = "tennis" > allowed_domains = ["sportsshoes.com"] > start_urls = ["http://www.sportsshoes.com/products/shoe/tennis/", > "http://www.sportsshoes.com/products/shoe/tennis#page=2", > "http://www.sportsshoes.com/products/shoe/tennis#page=3"] > rules = (Rule > (SgmlLinkExtractor(allow=(),restrict_xpaths=('//div[@class="product-detail"]',)) > , callback="parse_items", follow= True,),) > > def parse_items(self, response): > hxs = HtmlXPathSelector(response) > titles = hxs.select("//html") > items = [] > for titles in titles: > item = SportshoesItem() > item ["productname"] = > titles.select("//h1[@id='product_title']/span/text()").extract() > item ["Size"] = > titles.select('//option[@class="sizeOption"]/text()').extract() > item ["SKU"] = > titles.select("//div[@id='product_ref']/strong/text()").extract() > > items.append(item) > return(items) > PS : I had used this method too : > rules = (Rule > (SgmlLinkExtractor(allow=(),restrict_xpaths=('//div[@class="paginator"]',)), > follow= True), > Rule (SgmlLinkExtractor(restrict_xpaths=('//div[@class="hproduct > product"]',)) > , callback="parse_items", follow= True),) -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
