Sounds like the site is detecting you're scraping and trying to prevent it.
Id suggest looking into user agent middlewares to mimic a browser UA string
On Mar 5, 2015 1:41 AM, "Gaurang shah" <[email protected]> wrote:

> Hi Guys,
>
> I am trying scrapy a website, however the problem is whenever I try to
> visit the page from which I have to scrap data it redirects to some other
> page. if I visit that page manually in the the browser it's not being
> redirected anyway, I checked the response code as well, it shows 200.
>
> However with scrapy it's being redirected and I am able to see the code
> 302.
>
> Following is the website I am trying to scrap.
> http://www.lonmark.org/membership/directory/partners
>
> In the scrapy logs I am able to see following entries.
> 2015-03-05 15:08:36+0530 [lonamrk] DEBUG: Redirecting (302) to <GET
> http://www.lonmark.org/sitemap> from <GET
> http://www.lonmark.org/membership/directory/partners>
> 2015-03-05 15:08:37+0530 [lonamrk] DEBUG: Redirecting (302) to <GET
> http://www.lonmark.org/sitemap> from <GET http://www.lonmark.org/sitemap>
> 2015-03-05 15:08:37+0530 [lonamrk] DEBUG: Redirecting (302) to <GET
> http://www.lonmark.org/sitemap> from <GET http://www.lonmark.org/sitemap>
> 2015-03-05 15:08:41+0530 [lonamrk] DEBUG: Redirecting (302) to <GET
> http://www.lonmark.org/sitemap> from <GET http://www.lonmark.org/sitemap>
>
> Following the code.
> class Spider(BaseSpider):
>     name = "lonamrk"
>     allowed_domains = ["lonmark.org"]
>     # Request.meta = {'dont_redirect': True,
>     #                 'handle_httpstatus_list': [302]}
>
>     start_urls = ["http://www.lonmark.org/membership/directory/partners";]
>
>     def parse(self, response):
>         print response.url
>         hxs = HtmlXPathSelector(response)
>         company_links =
> hxs.select("//*[@id='page_content']/table/tbody/tr[1]/td[1]/a/@href")
>         for link in company_links:
>             yield 
> Request("http://www.lonmark.org/membership/directory/"+link._root,
> callback=self.parse_company_info)
>
>
>
> If I uncomment the code, and stop redirection. Then I am not getting
> anything in the response body.
>
> would someone please help me what to do ???
>
>  --
> You received this message because you are subscribed to the Google Groups
> "scrapy-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> Visit this group at http://groups.google.com/group/scrapy-users.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to