Re: [VOTE] Release Apache Nutch 1.8RC#2
+1 from me. Thanks everyone On Sunday, 16 March 2014, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: +1 from me! SIGS pass, CHECKSUMS pass: [chipotle:~/tmp/apache-nutch-1.8] mattmann% $HOME/bin/stage_apache_rc apache-nutch 1.8-bin https://dist.apache.org/repos/dist/dev/nutch/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 79.7M 100 79.7M0 0 894k 0 0:01:31 0:01:31 --:--:-- 926k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 836 100 8360 0 2291 0 --:--:-- --:--:-- --:--:-- 2902 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 10078 100780 0214 0 --:--:-- --:--:-- --:--:-- 268 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 81.0M 100 81.0M0 0 828k 0 0:01:40 0:01:40 --:--:-- 809k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 836 100 8360 0 2399 0 --:--:-- --:--:-- --:--:-- 3051 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 10075 100750 0201 0 --:--:-- --:--:-- --:--:-- 255 [chipotle:~/tmp/apache-nutch-1.8] mattmann% $HOME/bin/stage_apache_rc apache-nutch 1.8-src https://dist.apache.org/repos/dist/dev/nutch/ % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 2692k 100 2692k0 0 602k 0 0:00:04 0:00:04 --:--:-- 646k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 836 100 8360 0 2306 0 --:--:-- --:--:-- --:--:-- 2912 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 10078 100780 0204 0 --:--:-- --:--:-- --:--:-- 255 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 4547k 100 4547k0 0 564k 0 0:00:08 0:00:08 --:--:-- 671k % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 100 836 100 8360 0 2182 0 --:--:-- --:--:-- --:--:-- 2814 % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 10075 100750 0203 0 --:--:-- --:--:-- --:--:-- 268 [chipotle:~/tmp/apache-nutch-1.8] mattmann% $HOME/bin/verify_gpg_sigs Verifying Signature for file apache-nutch-1.8-bin.tar.gz.asc gpg: Signature made Tue Mar 11 14:23:44 2014 PDT using RSA key ID 48BAEBF6 gpg: Good signature from Lewis John McGibbney (CODE SIGNING KEY) lewi...@apache.org javascript:; gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: DB7B 5199 121C 08A5 C8F4 052B 3A47 17F0 48BA EBF6 Verifying Signature for file apache-nutch-1.8-bin.zip.asc gpg: Signature made Tue Mar 11 14:25:56 2014 PDT using RSA key ID 48BAEBF6 gpg: Good signature from Lewis John McGibbney (CODE SIGNING KEY) lewi...@apache.org javascript:; gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: DB7B 5199 121C 08A5 C8F4 052B 3A47 17F0 48BA EBF6 Verifying Signature for file apache-nutch-1.8-src.tar.gz.asc gpg: Signature made Tue Mar 11 14:26:28 2014 PDT using RSA key ID 48BAEBF6 gpg: Good signature from Lewis John McGibbney (CODE SIGNING KEY) lewi...@apache.org javascript:; gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: DB7B 5199 121C 08A5 C8F4 052B 3A47 17F0 48BA EBF6 Verifying Signature for file apache-nutch-1.8-src.zip.asc gpg: Signature made Tue Mar 11 14:26:44 2014 PDT using RSA key ID 48BAEBF6 gpg: Good signature from Lewis John McGibbney (CODE SIGNING KEY) lewi...@apache.org
[jira] [Commented] (NUTCH-1737) Upgrade to recent JUnit 4.x
[ https://issues.apache.org/jira/browse/NUTCH-1737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937115#comment-13937115 ] Lewis John McGibbney commented on NUTCH-1737: - I'll be happy to take this on later Seb. It is a PITA but it is time well invested. Upgrade to recent JUnit 4.x --- Key: NUTCH-1737 URL: https://issues.apache.org/jira/browse/NUTCH-1737 Project: Nutch Issue Type: Improvement Affects Versions: 1.8 Reporter: Sebastian Nagel Priority: Minor Fix For: 1.9 Attachments: NUTCH-1737-trivial.patch While trunk still remains on JUnit 3.8.1, 2.x uses JUnit 4.11 and has already upgraded tests (NUTCH-1573). This makes it difficult to port tests which use JUnit 4 features from 2.x to trunk. There are two solutions: # (lightweight, trivial patch attached) upgrade only ivy dependency, upgrade tests later # upgrade also all tests to use JUnit 4 annotations and setup(), cf. NUTCH-1573 -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1734) Make SolrIndexWriter more intelligent
[ https://issues.apache.org/jira/browse/NUTCH-1734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937137#comment-13937137 ] Lewis John McGibbney commented on NUTCH-1734: - Excellent to see you log this issue [~la...@protulae.com]. We can keep discussion of the issue here. Make SolrIndexWriter more intelligent - Key: NUTCH-1734 URL: https://issues.apache.org/jira/browse/NUTCH-1734 Project: Nutch Issue Type: Improvement Affects Versions: 1.7, 2.2.1 Reporter: Lajos Moczar Priority: Minor The current mapping of the NutchDocument to SolrDocument is based on the fields in the former which potentially can cause problems when you are using an existing Solr schema: 1) the existing logic requires Solr to support all Nutch fields, which might not be the case (like segment). 2) you can map a Nutch field to at most 2 Solr fields (i.e. one via a field and one via a copy tag because the source attribute is the Map key and therefore you can only have one. Additionally, it would be nice to support some level of transformations, literals, etc, like used in Solr DIH. I propose to make the code more intelligent so that, while supporting the existing strict mapping that people are used to, allows more flexible and intelligent mapping. It will also include a transformation architecture that can be expanded over time. The general approach is to reverse the building of the SolrDocument, and populate the doc based on the Solr destination fields as defined in solrindex-mapping.xml, i.e., it populates the doc based on what the target Solr wants to receive, not just what Nutch wants to send. The Map of fields in solrindex-mapping.xml will be keyed by dest, i.e. the Solr field name, not source. That way one can map a source to multiple destinations. A mapping type attribute (defaults to just a simple copy from Nutch to Solr) will support literals and transformations. Note that a default strict mapping (i.e. the Solr schema by default MUST support all NutchDocument fields) will be supported for backwards compatibility. I assume this will be what people want. I will submit patches in due course. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated NUTCH-1736: Fix Version/s: 1.9 2.3 Can't fetch page if http response header contains Transfer-Encoding:chunked --- Key: NUTCH-1736 URL: https://issues.apache.org/jira/browse/NUTCH-1736 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 Reporter: ysc Priority: Critical Fix For: 2.3, 1.9 Attachments: nutch-2.2.1.patch, nutch1.7.patch Original Estimate: 24h Remaining Estimate: 24h fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1647) protocol-http throws unzipBestEffort returned null for some pages
[ https://issues.apache.org/jira/browse/NUTCH-1647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937257#comment-13937257 ] Sebastian Nagel commented on NUTCH-1647: Seems to be a duplicate of NUTCH-1736: {{nutch1.7.patch}} fixes fetch of {{http://www.provinciegroningen.nl/actueel/dossiers/rwe-centrale/}} protocol-http throws unzipBestEffort returned null for some pages - Key: NUTCH-1647 URL: https://issues.apache.org/jira/browse/NUTCH-1647 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.7 Reporter: Markus Jelsma Fix For: 1.9 bin/nutch indexchecker http://www.provinciegroningen.nl/actueel/dossiers/rwe-centrale Fetch failed with protocol status: exception(16), lastModified=0: java.io.IOException: unzipBestEffort returned null {code} 2013-10-01 13:44:55,612 INFO http.Http - http.proxy.host = null 2013-10-01 13:44:55,612 INFO http.Http - http.proxy.port = 8080 2013-10-01 13:44:55,612 INFO http.Http - http.timeout = 12000 2013-10-01 13:44:55,612 INFO http.Http - http.content.limit = 5242880 2013-10-01 13:44:55,612 INFO http.Http - http.agent = Mozilla/5.0 (compatible; OpenindexSpider; +http://www.openindex.io/en/webmasters/spider.html) 2013-10-01 13:44:55,612 INFO http.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 2013-10-01 13:44:55,613 INFO http.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 2013-10-01 13:44:55,925 ERROR http.Http - Failed to get protocol output java.io.IOException: unzipBestEffort returned null at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:317) at org.apache.nutch.protocol.http.HttpResponse.init(HttpResponse.java:164) at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:140) at org.apache.nutch.indexer.IndexingFiltersChecker.run(IndexingFiltersChecker.java:86) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.IndexingFiltersChecker.main(IndexingFiltersChecker.java:150) {code} Haven't got a clue yet as to what the exact issue is. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937269#comment-13937269 ] Sebastian Nagel commented on NUTCH-1736: Thanks, [~yangshangchuan] for taking the time to analyze the problem. The patch also fixes NUTCH-1647. Any ideas, why {{http.content.limit}} must not be -1? Can't fetch page if http response header contains Transfer-Encoding:chunked --- Key: NUTCH-1736 URL: https://issues.apache.org/jira/browse/NUTCH-1736 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 Reporter: ysc Priority: Critical Fix For: 2.3, 1.9 Attachments: nutch-2.2.1.patch, nutch1.7.patch Original Estimate: 24h Remaining Estimate: 24h fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[Nutch Wiki] Trivial Update of Release_HOWTO by LewisJohnMcgibbney
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The Release_HOWTO page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/Release_HOWTO?action=diffrev1=29rev2=30 -rw-rw-r-- 1 mary mary 222 Mar 1 14:23 apache-nutch-1.8-src.zip.sha }}} 1. Make sure that '''all artifacts are editable by fellow committers''' e.g. chmod 775 - 1. Check out the release management area at https://dist.apache.org/repos/dist/dev/nutch/ and copy all artifacts to here then commit this. + 1. Check out the release management area at https://dist.apache.org/repos/dist/dev/nutch/{release.version} and copy all artifacts to here then commit this. 1. Make sure your pgp key is listed in the Nutch KEYS file located at http://www.apache.org/dist/nutch/KEYS - 1. Create and open a VOTE thread on user@ and dev@nutch.apache.org. The VOTE must pass with 3 +1 binding VOTE's before any release can take place. + 1. Create and open a VOTE thread on user@ and dev@nutch.apache.org. The VOTE must pass with 3 +1 binding VOTE's before any release can take place. A VOTE thread usually takes the form + {{{ + Hi user@ dev@, + + This thread is a VOTE for releasing Apache Nutch 1.8 RC#2. The release candidate comprises the following components. + + * A staging repository [0] containing various Maven artifacts + * A branch-1.8 of the trunk code [1] + * The tagged source upon which we are VOTE'ing [2] + * Finally, the release artifacts [3] which i would encourage you to verify for signatures and test. + + You should use the following KEYS [4] file to verify the signatures of all release artifacts. + + Please VOTE as follows + + [ ] +1 Push the release, I am happy :) + [ ] +0 I am not bothered either way + [ ] -1 I am not happy with this release candidate (please state why) + + Firstly thank you to everyone that contributed to Nutch. Secondly, thank you to everyone that VOTE's. It is appreciated. + + Thanks + Lewis + (on behalf of Nutch PMC) + + p.s. Here's my +1 + + [0] https://repository.apache.org/content/repositories/orgapachenutch-1001/ + [1] https://svn.apache.org/repos/asf/nutch/branches/branch-1.8 + [2] https://svn.apache.org/repos/asf/nutch/tags/release-1.8RC%232/ + [3] https://dist.apache.org/repos/dist/dev/nutch/ + [4] https://dist.apache.org/repos/dist/dev/nutch/KEYS + }}} + 1. Once the 72 hour period expires it is time to close the VOTE thread with a RESULT thread. This should simply state the outcome of VOTE'ing (including how many binding VOTE's were received. Finally it should included whether the VOTE passed and if the released can be made. + 1. In the instance where the VOTE does not pass, the release manager should roll bak all of the work above as well as '''DROP''' the staging artifacts. == Making the Release == + 1. head back over to the [[https://repository.apache.org/|staging repos]] and '''RELEASE''' them into the wild. + 1. Move the artifacts from the release management area at https://dist.apache.org/repos/dist/dev/nutch/{release.version} to https://dist.apache.org/repos/dist/release/nutch/{release.version} as follows {{{svn mv https://dist.apache.org/repos/dist/dev/nutch/{release.version} https://dist.apache.org/repos/dist/release/nutch/{release.version} --message Release Apache Nutch 1.X }}} 1. Wait 24 hours for release to propagate to mirrors. 1. Add the new release info to the [[https://svn.apache.org/repos/asf/nutch/site/publish/doap.rdf|doap.rdf]] file, and double check for any other updates that should be made to the doap file as well if it hasn't been updated in a while. If this is the case please see [[http://projects.apache.org/doap.html|here]] 1. Deploy new Nutch site (according to [[Website_Update_HOWTO]]).
[Nutch Wiki] Trivial Update of Release_HOWTO by LewisJohnMcgibbney
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The Release_HOWTO page has been changed by LewisJohnMcgibbney: https://wiki.apache.org/nutch/Release_HOWTO?action=diffrev1=31rev2=32 == Making the Release == 1. head back over to the [[https://repository.apache.org/|staging repos]] and '''RELEASE''' them into the wild. - 1. Move the artifacts from the release management area at https://dist.apache.org/repos/dist/dev/nutch/{release.version} to https://dist.apache.org/repos/dist/release/nutch/{release.version} as follows {{{svn mv https://dist.apache.org/repos/dist/dev/nutch/$release.version https://dist.apache.org/repos/dist/release/nutch/$release.version --message Release Apache Nutch $release.version }}} + 1. Move the artifacts from the release management area at https://dist.apache.org/repos/dist/dev/nutch/$release.version to https://dist.apache.org/repos/dist/release/nutch/$release.version as follows {{{svn mv https://dist.apache.org/repos/dist/dev/nutch/$release.version https://dist.apache.org/repos/dist/release/nutch/$release.version --message Release Apache Nutch $release.version }}} 1. Wait 24 hours for release to propagate to mirrors. 1. Add the new release info to the [[https://svn.apache.org/repos/asf/nutch/site/publish/doap.rdf|doap.rdf]] file, and double check for any other updates that should be made to the doap file as well if it hasn't been updated in a while. If this is the case please see [[http://projects.apache.org/doap.html|here]] 1. Deploy new Nutch site (according to [[Website_Update_HOWTO]]).
[jira] [Comment Edited] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13936103#comment-13936103 ] ysc edited comment on NUTCH-1736 at 3/17/14 3:05 AM: - problem: fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null detail: 2014-03-12 16:48:38,031 ERROR http.Http - Failed to get protocol output java.io.IOException: unzipBestEffort returned null at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:317) at org.apache.nutch.protocol.http.HttpResponse.init(HttpResponse.java:164) at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:140) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:703) 2014-03-12 16:48:38,031 INFO fetcher.Fetcher - fetch of http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html failed with: java.io.IOException: unzipBestEffort returned null 2014-03-12 16:48:38,031 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0 solution: this patch deal with http response header Transfer-Encoding:chunked important tips: property http.content.limit in nutch-site.xml must greater than 0 why must greater than 0? if property http.content.limit in nutch-site.xml is negative or 0, the chunkLen is negative or 0 too, see the code below, you can find the code in line 277 of java source file http://svn.apache.org/repos/asf/nutch/tags/release-1.7/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java if ( (contentBytesRead + chunkLen) http.getMaxContent() ) chunkLen= http.getMaxContent() - contentBytesRead; read one trunk has a condition: while (chunkBytesRead chunkLen) so, property http.content.limit in nutch-site.xml must greater than 0 was (Author: yangshangchuan): problem: fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null detail: 2014-03-12 16:48:38,031 ERROR http.Http - Failed to get protocol output java.io.IOException: unzipBestEffort returned null at org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:317) at org.apache.nutch.protocol.http.HttpResponse.init(HttpResponse.java:164) at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64) at org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:140) at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:703) 2014-03-12 16:48:38,031 INFO fetcher.Fetcher - fetch of http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html failed with: java.io.IOException: unzipBestEffort returned null 2014-03-12 16:48:38,031 INFO fetcher.Fetcher - -finishing thread FetcherThread, activeThreads=0 solution: this patch deal with http response header Transfer-Encoding:chunked important tips: property http.content.limit in nutch-site.xml must greater than 0 Can't fetch page if http response header contains Transfer-Encoding:chunked --- Key: NUTCH-1736 URL: https://issues.apache.org/jira/browse/NUTCH-1736 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 Reporter: ysc Priority: Critical Fix For: 2.3, 1.9 Attachments: nutch-2.2.1.patch, nutch1.7.patch Original Estimate: 24h Remaining Estimate: 24h fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937425#comment-13937425 ] ysc commented on NUTCH-1736: Hi Sebastian, I have modified the previous comment and added some explain. Can't fetch page if http response header contains Transfer-Encoding:chunked --- Key: NUTCH-1736 URL: https://issues.apache.org/jira/browse/NUTCH-1736 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 Reporter: ysc Priority: Critical Fix For: 2.3, 1.9 Attachments: nutch-2.2.1.patch, nutch1.7.patch Original Estimate: 24h Remaining Estimate: 24h fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937426#comment-13937426 ] lufeng commented on NUTCH-1736: --- Hi ysc you can check the content size to fix this issue like this. {code:java} if (http.getMaxContent() = 0 (contentBytesRead + chunkLen) http.getMaxContent() ) chunkLen= http.getMaxContent() - contentBytesRead; {code} Can't fetch page if http response header contains Transfer-Encoding:chunked --- Key: NUTCH-1736 URL: https://issues.apache.org/jira/browse/NUTCH-1736 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 Reporter: ysc Priority: Critical Fix For: 2.3, 1.9 Attachments: nutch-2.2.1.patch, nutch1.7.patch Original Estimate: 24h Remaining Estimate: 24h fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ysc updated NUTCH-1736: --- Attachment: (was: nutch1.7.patch) Can't fetch page if http response header contains Transfer-Encoding:chunked --- Key: NUTCH-1736 URL: https://issues.apache.org/jira/browse/NUTCH-1736 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 Reporter: ysc Priority: Critical Fix For: 2.3, 1.9 Attachments: nutch-2.2.1.patch Original Estimate: 24h Remaining Estimate: 24h fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ysc updated NUTCH-1736: --- Attachment: nutch1.7.patch can fix nutch1.x use nutch1.7.patch Can't fetch page if http response header contains Transfer-Encoding:chunked --- Key: NUTCH-1736 URL: https://issues.apache.org/jira/browse/NUTCH-1736 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 Reporter: ysc Priority: Critical Fix For: 2.3, 1.9 Attachments: nutch1.7.patch Original Estimate: 24h Remaining Estimate: 24h fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ysc updated NUTCH-1736: --- Attachment: nutch2.2.1.patch can fix nutch2.x use nutch2.2.1.patch Can't fetch page if http response header contains Transfer-Encoding:chunked --- Key: NUTCH-1736 URL: https://issues.apache.org/jira/browse/NUTCH-1736 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 Reporter: ysc Priority: Critical Fix For: 2.3, 1.9 Attachments: nutch1.7.patch, nutch2.2.1.patch Original Estimate: 24h Remaining Estimate: 24h fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked
[ https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13937438#comment-13937438 ] ysc commented on NUTCH-1736: Hi lufeng, thanks, this is a good idea, i have modified the patch files. Can't fetch page if http response header contains Transfer-Encoding:chunked --- Key: NUTCH-1736 URL: https://issues.apache.org/jira/browse/NUTCH-1736 Project: Nutch Issue Type: Bug Components: protocol Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1 Reporter: ysc Priority: Critical Fix For: 2.3, 1.9 Attachments: nutch1.7.patch, nutch2.2.1.patch Original Estimate: 24h Remaining Estimate: 24h fetching: http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html Fetch failed with protocol status: EXCEPTION: java.io.IOException: unzipBestEffort returned null -- This message was sent by Atlassian JIRA (v6.2#6252)