This is my seed page :
http://www.amazon.com/Kindle-eBooks/b?ie=UTF8&node=154606011
I want to follow the book links in this page and fetch the pricing details..
Below is my code:
class MySpider(CrawlSpider):
name = "scraper"
allowed_domains = ["amazon.com"]
start_urls =
["http://www.amazon.com/Kindle-eBooks/b?ie=UTF8&node=154606011"]
rules =
[Rule(SgmlLinkExtractor(allow=('.*?/\gp/\product.*?')),callback='parse_items')]
def parse_items(self, response):
sel=Selector(response)
items = []
item = AmazonScraper()
print 'inside'
print sel.css('#btAsinTitle::text').extract()
item ["title"] = ''.join(sel.css('#btAsinTitle::text').extract())
print '-----',item["title"]
item ["digitalprice"] =
''.join(sel.css('.digitalListPrice>.listprice::text').extract())
item["digitalprice"]=re.sub('\s+',' ',item["digitalprice"])
item ["listprice"] = ''.join(sel.css('.listPrice::text').extract())
item["listprice"]=re.sub('\s+',' ',item["listprice"])
item ["kindleprice"] =
''.join(sel.css('.priceLarge::text').extract())
item["kindleprice"]=re.sub('\s+',' ',item["kindleprice"])
items.append(item)
print items
return items
My code is not returning the expected results..What is the solution for
this?
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.