Sounds like the site is detecting you're scraping and trying to prevent it. Id suggest looking into user agent middlewares to mimic a browser UA string On Mar 5, 2015 1:41 AM, "Gaurang shah" <[email protected]> wrote:
> Hi Guys, > > I am trying scrapy a website, however the problem is whenever I try to > visit the page from which I have to scrap data it redirects to some other > page. if I visit that page manually in the the browser it's not being > redirected anyway, I checked the response code as well, it shows 200. > > However with scrapy it's being redirected and I am able to see the code > 302. > > Following is the website I am trying to scrap. > http://www.lonmark.org/membership/directory/partners > > In the scrapy logs I am able to see following entries. > 2015-03-05 15:08:36+0530 [lonamrk] DEBUG: Redirecting (302) to <GET > http://www.lonmark.org/sitemap> from <GET > http://www.lonmark.org/membership/directory/partners> > 2015-03-05 15:08:37+0530 [lonamrk] DEBUG: Redirecting (302) to <GET > http://www.lonmark.org/sitemap> from <GET http://www.lonmark.org/sitemap> > 2015-03-05 15:08:37+0530 [lonamrk] DEBUG: Redirecting (302) to <GET > http://www.lonmark.org/sitemap> from <GET http://www.lonmark.org/sitemap> > 2015-03-05 15:08:41+0530 [lonamrk] DEBUG: Redirecting (302) to <GET > http://www.lonmark.org/sitemap> from <GET http://www.lonmark.org/sitemap> > > Following the code. > class Spider(BaseSpider): > name = "lonamrk" > allowed_domains = ["lonmark.org"] > # Request.meta = {'dont_redirect': True, > # 'handle_httpstatus_list': [302]} > > start_urls = ["http://www.lonmark.org/membership/directory/partners"] > > def parse(self, response): > print response.url > hxs = HtmlXPathSelector(response) > company_links = > hxs.select("//*[@id='page_content']/table/tbody/tr[1]/td[1]/a/@href") > for link in company_links: > yield > Request("http://www.lonmark.org/membership/directory/"+link._root, > callback=self.parse_company_info) > > > > If I uncomment the code, and stop redirection. Then I am not getting > anything in the response body. > > would someone please help me what to do ??? > > -- > You received this message because you are subscribed to the Google Groups > "scrapy-users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/scrapy-users. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
