Hi,

The amazon server detect that the request come from a bot, so it's return
403 as HTTP status code => Execute access forbidden
Try to change User agent.


Regards.
---------
Lhassan Baazzi | Web Developer PHP - Symfony - JS - Scrapy
Email/Gtalk: [email protected] - Skype: baazzilhassan
Blog: http://blog.jbinfo.io/
[image: Donate - PayPal -]
<https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>Donate
- PayPal -
<https://www.paypal.com/cgi-bin/webscr?cmd=_s-xclick&hosted_button_id=BR744DG33RAGN>


2014-06-24 15:13 GMT+01:00 Abhijeet Raj <[email protected]>:

> I have the following code to crawl some data but when i rum the spider it
> is not entering the parse function,
> The code is as below
>       from scrapy.item import Item, Field
>       from scrapy.selector import Selector
>       from scrapy.spider import BaseSpider
>       from scrapy.selector import HtmlXPathSelector
>
>
>       class MyItem(Item):
>           reviewer_ranking = Field()
>           print "asdadsa"
>
>
>       class MySpider(BaseSpider):
>           name = 'myspider'
>           domain_name = ["amazon.com/gp/pdp/profile"]
>           start_urls = ["
> http://www.amazon.com/gp/pdp/profile/A28XDLTGHPIWE1";]
>           print"*****"
>           def parse(self, response):
>               print"fggfggftgtr"
>               sel = Selector(response)
>               hxs = HtmlXPathSelector(response)
>               item = MyItem()
>               item["reviewer_ranking"] =
> hxs.select('//span[@class="a-size-small
> a-color-secondary"]/text()').extract()
>               return item
>
> The output screen looks like this.
>
> asdadsa
> *****
> /home/raj/Documents/IIM A/Daily sales rank/Daily
> reviews/Reviews_scripts/Scripts_review/Reviews/Reviewer/crawler_reviewers_data.py:18:
> ScrapyDeprecationWarning: crawler_reviewers_data.MySpider inherits from
> deprecated class scrapy.spider.BaseSpider, please inherit from
> scrapy.spider.Spider. (warning only on first subclass, there may be others)
>   class MySpider(BaseSpider):
> 2014-06-24 19:41:38+0530 [scrapy] INFO: Scrapy 0.22.2 started (bot:
> scrapybot)
> 2014-06-24 19:41:38+0530 [scrapy] INFO: Optional features available: ssl,
> http11
> 2014-06-24 19:41:38+0530 [scrapy] INFO: Overridden settings: {}
> 2014-06-24 19:41:38+0530 [scrapy] INFO: Enabled extensions: LogStats,
> TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
> 2014-06-24 19:41:38+0530 [scrapy] INFO: Enabled downloader middlewares:
> HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware,
> RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware,
> HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware,
> HttpProxyMiddleware, ChunkedTransferMiddleware, DownloaderStats
> 2014-06-24 19:41:38+0530 [scrapy] INFO: Enabled spider middlewares:
> HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware,
> UrlLengthMiddleware, DepthMiddleware
> 2014-06-24 19:41:38+0530 [scrapy] INFO: Enabled item pipelines:
> 2014-06-24 19:41:38+0530 [myspider] INFO: Spider opened
> 2014-06-24 19:41:38+0530 [myspider] INFO: Crawled 0 pages (at 0
> pages/min), scraped 0 items (at 0 items/min)
> 2014-06-24 19:41:38+0530 [scrapy] DEBUG: Telnet console listening on
> 0.0.0.0:6027
> 2014-06-24 19:41:38+0530 [scrapy] DEBUG: Web service listening on
> 0.0.0.0:6084
> 2014-06-24 19:41:38+0530 [myspider] DEBUG: Crawled (403) <GET
> http://www.amazon.com/gp/pdp/profile/A28XDLTGHPIWE1> (referer: None)
> ['partial']
> 2014-06-24 19:41:38+0530 [myspider] INFO: Closing spider (finished)
> 2014-06-24 19:41:38+0530 [myspider] INFO: Dumping Scrapy stats:
> {'downloader/request_bytes': 242,
>  'downloader/request_count': 1,
>  'downloader/request_method_count/GET': 1,
>  'downloader/response_bytes': 28486,
>  'downloader/response_count': 1,
>  'downloader/response_status_count/403': 1,
>  'finish_reason': 'finished',
>  'finish_time': datetime.datetime(2014, 6, 24, 14, 11, 38, 696574),
>  'log_count/DEBUG': 3,
>  'log_count/INFO': 7,
>  'response_received_count': 1,
>  'scheduler/dequeued': 1,
>  'scheduler/dequeued/memory': 1,
>  'scheduler/enqueued': 1,
>  'scheduler/enqueued/memory': 1,
>  'start_time': datetime.datetime(2014, 6, 24, 14, 11, 38, 513615)}
> 2014-06-24 19:41:38+0530 [myspider] INFO: Spider closed (finished)
>
>
> Please help me out, i am stuck
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to