It mostly gives me sane responses such as 200(mostly), 302, 404, 502, etc. but this error doesn't give me any proper server response, and only could be caught in def process_exception() in the middleware. and when this error occurs, it generally happens continuously until it changes its proxy.
On Thursday, November 27, 2014 12:30:38 AM UTC+9, Nicolás Alejandro Ramírez Quiros wrote: > > Are your spiders getting stalled as well? > > El martes, 25 de noviembre de 2014 04:56:43 UTC-2, Sungmin Lee escribió: >> >> Hi all, >> >> I'm using proxy to crawl a site, and it randomly gives me a bunch of >> error messages like: >> >> 2014-10-20 05:26:10-0800 [foo.bar] DEBUG: Retrying <GET >> http://foo.bar/foobar> (failed 1 times): >> [<twisted.python.failure.Failure <class >> 'twisted.internet.error.ConnectionDone'>>, <twisted.python.failure.Failure >> <class 'twisted.web.http._DataLoss'>>] >> >> I think this mostly happens with a bad proxy but it sometimes occurs with >> a healthy proxy as well. >> The thing is that this not only skips the url entry to crawl. >> >> I implemented a middleware(especially for proxy and retry middlewares), >> but it's really hard to catch this exception on scrapy level. >> >> Has anyone had the same issue? >> >> Thanks! >> > -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
