Read time out protocol-http --------------------------- Key: NUTCH-1342 URL: https://issues.apache.org/jira/browse/NUTCH-1342 Project: Nutch Issue Type: Bug Components: fetcher Affects Versions: 1.4, 1.5 Reporter: Markus Jelsma Priority: Critical Fix For: 1.6
For some reason some URL's always time out with protocol-http but not protocol-httpclient. The stack trace is always the same: {code} 2012-04-20 11:25:44,275 ERROR http.Http - Failed to get protocol output java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.read1(BufferedInputStream.java:256) at java.io.BufferedInputStream.read(BufferedInputStream.java:317) at java.io.FilterInputStream.read(FilterInputStream.java:116) at java.io.PushbackInputStream.read(PushbackInputStream.java:169) at java.io.FilterInputStream.read(FilterInputStream.java:90) at org.apache.nutch.protocol.http.HttpResponse.readPlainContent(HttpResponse.java:228) at org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:157) at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138) {code} Some example URL's: * 404 http://www.fcgroningen.nl/tribunenamen/stemmen/ * 301 http://shop.fcgroningen.nl/aanbieding -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira