[ https://issues.apache.org/jira/browse/NUTCH-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13648785#comment-13648785 ]
Tejas Patil commented on NUTCH-1039: ------------------------------------ I am not able to reproduce this issue: {noformat}http://www.nu.nl/internet/2565955/internet-beinvloedt-geheugen.html Version: 7 Status: 2 (db_fetched) Fetch time: Sun Jun 02 13:23:05 PDT 2013 Modified time: Wed Dec 31 16:00:00 PST 1969 Retries since fetch: 0 Retry interval: 2592000 seconds (30 days) Score: 1.01 Signature: f7e46cab66e5e703efe554ec129bc6ba Metadata: Content-Type: application/xhtml+xml_pst_: success(1), lastModified=0{noformat} Not sure which checkin has fixed it or the server is returning the content-length with the response. > Fetcher fails for pages without content-length header > ----------------------------------------------------- > > Key: NUTCH-1039 > URL: https://issues.apache.org/jira/browse/NUTCH-1039 > Project: Nutch > Issue Type: Bug > Affects Versions: 1.4 > Reporter: Markus Jelsma > Assignee: Markus Jelsma > Fix For: 1.7 > > > Fetcher fails: > 2011-07-11 14:45:34,764 ERROR http.Http - > org.apache.nutch.protocol.http.api.HttpException: bad content length: > 2011-07-11 14:45:34,765 ERROR http.Http - at > org.apache.nutch.protocol.http.HttpResponse.readPlainContent(HttpResponse.java:218) > 2011-07-11 14:45:34,765 ERROR http.Http - at > org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:158) > 2011-07-11 14:45:34,765 ERROR http.Http - at > org.apache.nutch.protocol.http.Http.getResponse(Http.java:64) > 2011-07-11 14:45:34,765 ERROR http.Http - at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:138) > 2011-07-11 14:45:34,765 ERROR http.Http - at > org.apache.nutch.parse.ParserChecker.main(ParserChecker.java:79) > Both fetcher and indexing filter checker fail sometimes. I'm unsure whether > this is something in Nutch or whether the remote server only returns > content-length incidentally. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira