Hi all,

I'm implementing a spider working over proxy, so I've overridden 
proxymiddleware. it works so far so good.

What I want to ultimately achieve is that,

1) assign a proxy
2) start scraping
3) when proxy address is out-dated, broken, etc., apply new healthy proxy.
4) continue scraping


The problem is that, whenever a proxy address becomes corrupted, scrapy 
just hangs there waiting for TCP response.
I wanted to utilize httpRetryMiddleware but it doesn't help as scrapy 
doesn't return response.status.

2014-10-13 16:46:22-0700 [proxy_test] INFO: Crawled 0 pages (at 0 
pages/min), scraped 0 items (at 0 items/min)
2014-10-13 16:46:53-0700 [proxy_test] DEBUG: Retrying <GET 
http://some/website> (failed 1 times): TCP connection timed out: 60: 
Operation timed out.
2014-10-13 16:46:54-0700 [proxy_test] DEBUG: Retrying 
<GET http://some/website> (failed 1 times): TCP connection timed out: 60: 
Operation timed out.
2014-10-13 16:46:54-0700 [proxy_test] DEBUG: Retrying 
<GET http://some/website> (failed 1 times): TCP connection timed out: 60: 
Operation timed out.
2014-10-13 16:46:55-0700 [proxy_test] DEBUG: Retrying 
<GET http://some/website> (failed 1 times): TCP connection timed out: 60: 
Operation timed out.
2014-10-13 16:46:56-0700 [proxy_test] DEBUG: Retrying 
<GET http://some/website> (failed 1 times): TCP connection timed out: 60: 
Operation timed out.
2014-10-13 16:46:57-0700 [proxy_test] DEBUG: Retrying 
<GET http://some/website> (failed 1 times): TCP connection timed out: 60: 
Operation timed out.


Is there any way that I can handle this timeout issue?

Thanks!

-- 
You received this message because you are subscribed to the Google Groups 
"scrapy-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/scrapy-users.
For more options, visit https://groups.google.com/d/optout.

Reply via email to