Re: Bunch of twisted error messages when using proxy

Sungmin Lee Thu, 27 Nov 2014 17:54:47 -0800

It mostly gives me sane responses such as 200(mostly), 302, 404, 502, etc.
but this error doesn't give me any proper server response, and only could 
be caught in def process_exception() in the middleware. 
and when this error occurs, it generally happens continuously until it 
changes its proxy.


On Thursday, November 27, 2014 12:30:38 AM UTC+9, Nicolás Alejandro Ramírez 
Quiros wrote:
>
> Are your spiders getting stalled as well?
>
> El martes, 25 de noviembre de 2014 04:56:43 UTC-2, Sungmin Lee escribió:
>>
>> Hi all,
>>
>> I'm using proxy to crawl a site, and it randomly gives me a bunch of 
>> error messages like:
>>
>> 2014-10-20 05:26:10-0800 [foo.bar] DEBUG: Retrying <GET 
>> http://foo.bar/foobar> (failed 1 times): 
>> [<twisted.python.failure.Failure <class 
>> 'twisted.internet.error.ConnectionDone'>>, <twisted.python.failure.Failure 
>> <class 'twisted.web.http._DataLoss'>>]
>>
>> I think this mostly happens with a bad proxy but it sometimes occurs with 
>> a healthy proxy as well. 
>> The thing is that this not only skips the url entry to crawl.
>>  
>> I implemented a middleware(especially for proxy and retry middlewares), 
>> but it's really hard to catch this exception on scrapy level.
>>
>> Has anyone had the same issue?
>>
>> Thanks!
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Re: Bunch of twisted error messages when using proxy

Reply via email to