Problems with Pagination . Crawl Spider only extracts values of the first page

Alfons Jose Thu, 26 Jun 2014 03:21:08 -0700

>
> Hey,


My task is to take stock updates from one of my suppliers : 
www.sportsshoes.com .

The issue I am facing is that despite the Crawl Spider visiting each page 
of a category page it only gives returns data from the first page. This 
also is the case if I try and scrape each page independently i.e even if I 
assign it to scrape the third page of the category it only returns results 
from the first page.


My Code: 

from scrapy.contrib.spiders import CrawlSpider, Rule
> from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
> from scrapy.selector import HtmlXPathSelector
> from scrapy.spider import BaseSpider
> from sportshoes.items import SportshoesItem
> import urlparse 
> from scrapy.http.request import Request
>
> class MySpider(CrawlSpider):
>   name = "tennis"
>   allowed_domains = ["sportsshoes.com"]
>   start_urls = ["http://www.sportsshoes.com/products/shoe/tennis/";,
>                 "http://www.sportsshoes.com/products/shoe/tennis#page=2";,
>                 "http://www.sportsshoes.com/products/shoe/tennis#page=3";]
>
  rules = (Rule 
> (SgmlLinkExtractor(allow=(),restrict_xpaths=('//div[@class="product-detail"]',))
>     , callback="parse_items", follow= True,),)
>     
>   def parse_items(self, response):
>     hxs = HtmlXPathSelector(response)
>     titles = hxs.select("//html")
>     items = []
>     for titles in titles:
>       item = SportshoesItem()
>       item ["productname"] = 
> titles.select("//h1[@id='product_title']/span/text()").extract()
>       item ["Size"] = 
> titles.select('//option[@class="sizeOption"]/text()').extract()
>       item ["SKU"] = 
> titles.select("//div[@id='product_ref']/strong/text()").extract()
>            
>       items.append(item)
>       return(items)
>

PS : I had used this method too :

>   rules = (Rule 
> (SgmlLinkExtractor(allow=(),restrict_xpaths=('//div[@class="paginator"]',)), 
> follow= True),
>     Rule (SgmlLinkExtractor(restrict_xpaths=('//div[@class="hproduct 
> product"]',))
>     , callback="parse_items", follow= True),)

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Problems with Pagination . Crawl Spider only extracts values of the first page

Reply via email to