Hi, The amazon server detect that the request come from a bot, so it's return 403 as HTTP status code => Execute access forbidden Try to change User agent.
Regards. --------- Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy Email/Gtalk: [email protected] - Skype: baazzilhassan Blog: http://blog.jbinfo.io/ [image: Donate - PayPal -] <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate - PayPal - <https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN> 2014-06-24 15:13 GMT+01:00 Abhijeet Raj <[email protected]>: > I have the following code to crawl some data but when i rum the spider it > is not entering the parse function, > The code is as below > from scrapy.item import Item, Field > from scrapy.selector import Selector > from scrapy.spider import BaseSpider > from scrapy.selector import HtmlXPathSelector > > > class MyItem(Item): > reviewer_ranking = Field() > print "asdadsa" > > > class MySpider(BaseSpider): > name = 'myspider' > domain_name = ["amazon.com/gp/pdp/profile"] > start_urls = [" > http://www.amazon.com/gp/pdp/profile/A28XDLTGHPIWE1"] > print"*****" > def parse(self, response): > print"fggfggftgtr" > sel = Selector(response) > hxs = HtmlXPathSelector(response) > item = MyItem() > item["reviewer_ranking"] = > hxs.select('//span[@class="a-size-small > a-color-secondary"]/text()').extract() > return item > > The output screen looks like this. > > asdadsa > ***** > /home/raj/Documents/IIM A/Daily sales rank/Daily > reviews/Reviews_scripts/Scripts_review/Reviews/Reviewer/crawler_reviewers_data.py:18: > ScrapyDeprecationWarning: crawler_reviewers_data.MySpider inherits from > deprecated class scrapy.spider.BaseSpider, please inherit from > scrapy.spider.Spider. (warning only on first subclass, there may be others) > class MySpider(BaseSpider): > 2014-06-24 19:41:38+0530 [scrapy] INFO: Scrapy 0.22.2 started (bot: > scrapybot) > 2014-06-24 19:41:38+0530 [scrapy] INFO: Optional features available: ssl, > http11 > 2014-06-24 19:41:38+0530 [scrapy] INFO: Overridden settings: {} > 2014-06-24 19:41:38+0530 [scrapy] INFO: Enabled extensions: LogStats, > TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState > 2014-06-24 19:41:38+0530 [scrapy] INFO: Enabled downloader middlewares: > HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, > RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, > HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, > HttpProxyMiddleware, ChunkedTransferMiddleware, DownloaderStats > 2014-06-24 19:41:38+0530 [scrapy] INFO: Enabled spider middlewares: > HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, > UrlLengthMiddleware, DepthMiddleware > 2014-06-24 19:41:38+0530 [scrapy] INFO: Enabled item pipelines: > 2014-06-24 19:41:38+0530 [myspider] INFO: Spider opened > 2014-06-24 19:41:38+0530 [myspider] INFO: Crawled 0 pages (at 0 > pages/min), scraped 0 items (at 0 items/min) > 2014-06-24 19:41:38+0530 [scrapy] DEBUG: Telnet console listening on > 0.0.0.0:6027 > 2014-06-24 19:41:38+0530 [scrapy] DEBUG: Web service listening on > 0.0.0.0:6084 > 2014-06-24 19:41:38+0530 [myspider] DEBUG: Crawled (403) <GET > http://www.amazon.com/gp/pdp/profile/A28XDLTGHPIWE1> (referer: None) > ['partial'] > 2014-06-24 19:41:38+0530 [myspider] INFO: Closing spider (finished) > 2014-06-24 19:41:38+0530 [myspider] INFO: Dumping Scrapy stats: > {'downloader/request_bytes': 242, > 'downloader/request_count': 1, > 'downloader/request_method_count/GET': 1, > 'downloader/response_bytes': 28486, > 'downloader/response_count': 1, > 'downloader/response_status_count/403': 1, > 'finish_reason': 'finished', > 'finish_time': datetime.datetime(2014, 6, 24, 14, 11, 38, 696574), > 'log_count/DEBUG': 3, > 'log_count/INFO': 7, > 'response_received_count': 1, > 'scheduler/dequeued': 1, > 'scheduler/dequeued/memory': 1, > 'scheduler/enqueued': 1, > 'scheduler/enqueued/memory': 1, > 'start_time': datetime.datetime(2014, 6, 24, 14, 11, 38, 513615)} > 2014-06-24 19:41:38+0530 [myspider] INFO: Spider closed (finished) > > > Please help me out, i am stuck > > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
