Re: 302 redirect triggers IgnoreRequest

Michele Coscia Tue, 18 Nov 2014 11:46:28 -0800

Found the problem. I have a custom middleware for checking the MIME type. 
302 wasn't matching so the request was discarded. I added in my middleware 
a check: if response status is 302 then do nothing. Now it works.


Thanks!
Michele C

Il giorno martedì 18 novembre 2014 14:34:16 UTC-5, Michele Coscia ha 
scritto:
>
> My code already contains that argument. For some reason, my original 
> message was cut. Here's the rest of it:
>
>
> However, in my crawler this does not happen. The spider does not enter in 
> the parse method. I overwrote the start_request method as
>
>     def start_requests(self):
>         for url in self.start_urls:
>            yield Request(url, dont_filter = True, callback = self.parse, 
> errback = self.handle_errors)
>
> and handle_errors gets called where I see that scrapy raised a 
> IgnoreRequest.
>
> My spider is a simple extension of scrapy.Spider. It is called as advised 
> in 
> http://doc.scrapy.org/en/latest/topics/practices.html#run-scrapy-from-a-script
> .
>
> What I think is going on, is that:
> - The spider gets the 302 to http://mhs.mt.gov/
> - Puts http://mhs.mt.gov/ into the queue
> - Puts http://mhs.mt.gov/ into the visited pages
> - Asks for the next page
> - Sees http://mhs.mt.gov/
> - Raises the IgnoreRequests because it already has seen the URL
> - Ends
>
> However this does not happen in the shell.
>
> How can I plug the shell behavior into my spider?
> Can I tell the crawler to revisit the page once (but not more, otherwise 
> I'll be stuck forever)?
>
> I searched on the web, but people are more interested in avoiding 
> following a 302 than actually following it.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: 302 redirect triggers IgnoreRequest

Reply via email to