Hi Hakim,
I'm not sure how you get this "instance" with attributes related to errors.
and you catching these through an errback?
You can get non-200 responses via HttpError middleware (enabled by default)
and by defining an handle_httpstatus_list attribute to your spider
Example:
from scrapy.spider import Spider
class ErrorSpider(Spider):
name = "testerror"
allowed_domains = ["dmoz.org"]
start_urls = [
"http://www.dmoz.org/",
"http://www.dmoz.org/rererere/",
]
handle_httpstatus_list = [404]
def parse(self, response):
self.log("type: %s; status %d" % (type(response), response.status))
On Tuesday, April 15, 2014 4:51:23 PM UTC+2, Hakim Benoudjit wrote:
>
> hi guys,
>
> I have a little issue with reponse object inside a request callback when
> the page returns a 404:
> - If the page exists (http code:* 200*) response is of type
> *HtmlResponse*.
> - If the page returns 404, response is of type *instance *which
> contain some attriubtes related to error messages, and in this latter case,
> *status
> *isnt an attriburte of the *response *object.
>
> so I can know if the response *status *is *404*, only if I verify *response
> *object class (*HtmlResponse or **instance *).
>
> how do we know that a page returns *404 *if *response.status *isnt
> available as an attribute of *reponse *object ?
>
--
You received this message because you are subscribed to the Google Groups
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.