It mostly gives me sane responses such as 200(mostly), 302, 404, 502, etc.
but this error doesn't give me any proper server response, and only could 
be caught in def process_exception() in the middleware. 
and when this error occurs, it generally happens continuously until it 
changes its proxy.

On Thursday, November 27, 2014 12:30:38 AM UTC+9, Nicolás Alejandro Ramírez 
Quiros wrote:
>
> Are your spiders getting stalled as well?
>
> El martes, 25 de noviembre de 2014 04:56:43 UTC-2, Sungmin Lee escribió:
>>
>> Hi all,
>>
>> I'm using proxy to crawl a site, and it randomly gives me a bunch of 
>> error messages like:
>>
>> 2014-10-20 05:26:10-0800 [foo.bar] DEBUG: Retrying <GET 
>> http://foo.bar/foobar> (failed 1 times): 
>> [<twisted.python.failure.Failure <class 
>> 'twisted.internet.error.ConnectionDone'>>, <twisted.python.failure.Failure 
>> <class 'twisted.web.http._DataLoss'>>]
>>
>> I think this mostly happens with a bad proxy but it sometimes occurs with 
>> a healthy proxy as well. 
>> The thing is that this not only skips the url entry to crawl.
>>  
>> I implemented a middleware(especially for proxy and retry middlewares), 
>> but it's really hard to catch this exception on scrapy level.
>>
>> Has anyone had the same issue?
>>
>> Thanks!
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to