Hi all, I'm implementing a spider working over proxy, so I've overridden proxymiddleware. it works so far so good.
What I want to ultimately achieve is that, 1) assign a proxy 2) start scraping 3) when proxy address is out-dated, broken, etc., apply new healthy proxy. 4) continue scraping The problem is that, whenever a proxy address becomes corrupted, scrapy just hangs there waiting for TCP response. I wanted to utilize httpRetryMiddleware but it doesn't help as scrapy doesn't return response.status. 2014-10-13 16:46:22-0700 [proxy_test] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2014-10-13 16:46:53-0700 [proxy_test] DEBUG: Retrying <GET http://some/website> (failed 1 times): TCP connection timed out: 60: Operation timed out. 2014-10-13 16:46:54-0700 [proxy_test] DEBUG: Retrying <GET http://some/website> (failed 1 times): TCP connection timed out: 60: Operation timed out. 2014-10-13 16:46:54-0700 [proxy_test] DEBUG: Retrying <GET http://some/website> (failed 1 times): TCP connection timed out: 60: Operation timed out. 2014-10-13 16:46:55-0700 [proxy_test] DEBUG: Retrying <GET http://some/website> (failed 1 times): TCP connection timed out: 60: Operation timed out. 2014-10-13 16:46:56-0700 [proxy_test] DEBUG: Retrying <GET http://some/website> (failed 1 times): TCP connection timed out: 60: Operation timed out. 2014-10-13 16:46:57-0700 [proxy_test] DEBUG: Retrying <GET http://some/website> (failed 1 times): TCP connection timed out: 60: Operation timed out. Is there any way that I can handle this timeout issue? Thanks! -- You received this message because you are subscribed to the Google Groups "scrapy-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/scrapy-users. For more options, visit https://groups.google.com/d/optout.
