[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936103#comment-13936103 ]
ysc commented on NUTCH-1736: ---------------------------- problem: fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null detail: 2014-03-12 16:48:38,031 ERROR http.Http - Failed to get protocol output java.io.IOException: unzipBestEffort returned null at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:317) at org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:164) at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:140) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:703) 2014-03-12 16:48:38,031 INFO fetcher.Fetcher - fetch of http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html failed with: java.io.IOException: unzipBestEffort returned null 2014-03-12 16:48:38,031 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0 solution: this patch deal with http response header Transfer-Encoding:chunked important tips: property http.content.limit in nutch-site.xml must greater than 0 > can't fetch page if http response header contains Transfer-Encoding:chunked > --------------------------------------------------------------------------- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol > Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 > Reporter: ysc > Priority: Critical > Attachments: nutch-2.2.1.patch, nutch1.7.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)