[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936103#comment-13936103
 ] 

ysc commented on NUTCH-1736:
----------------------------

problem:
 
fetching: 
http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
unzipBestEffort returned null

detail:

2014-03-12 16:48:38,031 ERROR http.Http - Failed to get protocol output
java.io.IOException: unzipBestEffort returned null
at 
org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:317)
at org.apache.nutch.protocol.http.HttpResponse.<init>(HttpResponse.java:164)
at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64)
at 
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:140)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:703)
2014-03-12 16:48:38,031 INFO  fetcher.Fetcher - fetch of 
http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html 
failed with: java.io.IOException: unzipBestEffort returned null
2014-03-12 16:48:38,031 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=0

solution:

this patch deal with http response header Transfer-Encoding:chunked

important tips: 

property http.content.limit in nutch-site.xml must greater than 0

> can't fetch page if http response header contains Transfer-Encoding:chunked
> ---------------------------------------------------------------------------
>
>                 Key: NUTCH-1736
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1736
>             Project: Nutch
>          Issue Type: Bug
>          Components: protocol
>    Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1
>            Reporter: ysc
>            Priority: Critical
>         Attachments: nutch-2.2.1.patch, nutch1.7.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to