[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028007#comment-14028007 ] Hudson commented on NUTCH-1736: --- SUCCESS: Integrated in Nutch-trunk #2652 (See [https://builds.apache.org/job/Nutch-trunk/2652/]) NUTCH-1736 Can't fetch page if http response header contains Transfer-Encoding:chunked (jnioche: http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1601935) * /nutch/trunk/CHANGES.txt * /nutch/trunk/src/java/org/apache/nutch/metadata/HttpHeaders.java * /nutch/trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java > Can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 2.3, 1.8 >Reporter: ysc >Priority: Critical > Fix For: 2.4, 1.9 > > Attachments: nutch1.7.patch, nutch2.2.1.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027994#comment-14027994 ] Hudson commented on NUTCH-1736: --- FAILURE: Integrated in Nutch-nutchgora #1039 (See [https://builds.apache.org/job/Nutch-nutchgora/1039/]) NUTCH-1736 Can't fetch page if http response header contains Transfer-Encoding:chunked (jnioche: http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1601937) * /nutch/branches/2.x/CHANGES.txt * /nutch/branches/2.x/src/java/org/apache/nutch/metadata/HttpHeaders.java * /nutch/branches/2.x/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java > Can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 2.3, 1.8 >Reporter: ysc >Priority: Critical > Fix For: 2.4, 1.9 > > Attachments: nutch1.7.patch, nutch2.2.1.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027945#comment-14027945 ] Julien Nioche commented on NUTCH-1736: -- 2.x Committed revision 1601937. > Can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 2.3, 1.8 >Reporter: ysc >Priority: Critical > Fix For: 2.4, 1.9 > > Attachments: nutch1.7.patch, nutch2.2.1.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027922#comment-14027922 ] Julien Nioche commented on NUTCH-1736: -- Trunk :Committed revision 1601935. > Can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 2.3, 1.8 >Reporter: ysc >Priority: Critical > Fix For: 2.4, 1.9 > > Attachments: nutch1.7.patch, nutch2.2.1.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949513#comment-13949513 ] Julien Nioche commented on NUTCH-1736: -- Looks good and seems to have fixed the issue I was encountering on "http://www.imones.lt/v-amonto-individuali-veikla-filialas-832833";. Needs to be tested a bit more before committing. Thanks for this contribution > Can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 >Reporter: ysc >Priority: Critical > Fix For: 2.3, 1.9 > > Attachments: nutch1.7.patch, nutch2.2.1.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937438#comment-13937438 ] ysc commented on NUTCH-1736: Hi lufeng, thanks, this is a good idea, i have modified the patch files. > Can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 >Reporter: ysc >Priority: Critical > Fix For: 2.3, 1.9 > > Attachments: nutch1.7.patch, nutch2.2.1.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937426#comment-13937426 ] lufeng commented on NUTCH-1736: --- Hi ysc you can check the content size to fix this issue like this. {code:java} if (http.getMaxContent() >= 0 && (contentBytesRead + chunkLen) > http.getMaxContent() ) chunkLen= http.getMaxContent() - contentBytesRead; {code} > Can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 >Reporter: ysc >Priority: Critical > Fix For: 2.3, 1.9 > > Attachments: nutch-2.2.1.patch, nutch1.7.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937425#comment-13937425 ] ysc commented on NUTCH-1736: Hi Sebastian, I have modified the previous comment and added some explain. > Can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 >Reporter: ysc >Priority: Critical > Fix For: 2.3, 1.9 > > Attachments: nutch-2.2.1.patch, nutch1.7.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937418#comment-13937418 ] lufeng commented on NUTCH-1736: --- Hi Sebastian, I think this patch is not related to NUTCH-1647, maybe they have same exception error result. NUTCH-1647 is about url redirection issue. > Can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 >Reporter: ysc >Priority: Critical > Fix For: 2.3, 1.9 > > Attachments: nutch-2.2.1.patch, nutch1.7.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937269#comment-13937269 ] Sebastian Nagel commented on NUTCH-1736: Thanks, [~yangshangchuan] for taking the time to analyze the problem. The patch also fixes NUTCH-1647. Any ideas, why {{http.content.limit}} must not be -1? > Can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 >Reporter: ysc >Priority: Critical > Fix For: 2.3, 1.9 > > Attachments: nutch-2.2.1.patch, nutch1.7.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936103#comment-13936103 ] ysc commented on NUTCH-1736: problem: fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null detail: 2014-03-12 16:48:38,031 ERROR http.Http - Failed to get protocol output java.io.IOException: unzipBestEffort returned null at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:317) at org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:164) at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:140) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:703) 2014-03-12 16:48:38,031 INFO fetcher.Fetcher - fetch of http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html failed with: java.io.IOException: unzipBestEffort returned null 2014-03-12 16:48:38,031 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0 solution: this patch deal with http response header Transfer-Encoding:chunked important tips: property http.content.limit in nutch-site.xml must greater than 0 > can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 >Reporter: ysc >Priority: Critical > Attachments: nutch-2.2.1.patch, nutch1.7.patch > > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936063#comment-13936063 ] ysc commented on NUTCH-1736: for nutch1.x can use the patch nutch1.7.patch for nutch2.x can use the patch nutch-2.2.1.patch > can't fetch page if http response header contains Transfer-Encoding:chunked > --- > > Key: NUTCH-1736 > URL: https://issues.apache.org/jira/browse/NUTCH-1736 > Project: Nutch > Issue Type: Bug > Components: protocol >Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 >Reporter: ysc >Priority: Critical > Original Estimate: 24h > Remaining Estimate: 24h > > fetching: > http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html > Fetch failed with protocol status: EXCEPTION: java.io.IOException: > unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)