Scrape multiple pages in Amazon

Kaushik Ramachandran Thu, 19 Jun 2014 06:30:08 -0700

This is my seed page :

http://www.amazon.com/Kindle-eBooks/b?ie=UTF8&node=154606011



I want to follow the book links in this page and fetch the pricing details..

Below is my code:

class MySpider(CrawlSpider):
    name = "scraper"
    allowed_domains = ["amazon.com"]
    start_urls = 
["http://www.amazon.com/Kindle-eBooks/b?ie=UTF8&node=154606011";]   

    rules = 
[Rule(SgmlLinkExtractor(allow=('.*?/\gp/\product.*?')),callback='parse_items')]

    def parse_items(self, response):
       
        sel=Selector(response)
        items = []
        item = AmazonScraper()
        print 'inside'
        print sel.css('#btAsinTitle::text').extract()
        item ["title"] = ''.join(sel.css('#btAsinTitle::text').extract())
        print '-----',item["title"]
        item ["digitalprice"] = 
''.join(sel.css('.digitalListPrice>.listprice::text').extract())
        item["digitalprice"]=re.sub('\s+',' ',item["digitalprice"])
        item ["listprice"] = ''.join(sel.css('.listPrice::text').extract())
        item["listprice"]=re.sub('\s+',' ',item["listprice"])
        item ["kindleprice"] = 
''.join(sel.css('.priceLarge::text').extract())
        item["kindleprice"]=re.sub('\s+',' ',item["kindleprice"])
        
        items.append(item)
        
        print items
            
        return items

My code is not returning the expected results..What is the solution for 
this?

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Scrape multiple pages in Amazon

Reply via email to