[jira] Commented: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing.
[ http://issues.apache.org/jira/browse/NUTCH-374?page=comments#action_12438722 ] Meghna Kukreja commented on NUTCH-374: -- I have experienced this same problem and I fixed it by making this change to the function unzipBestEffort() in GZIPUtils.java: I changed this if statement: if ((written + size) sizeLimit) { outStream.write(buf, 0, sizeLimit - written); break; } to if ((written + size) sizeLimit sizeLimit = 0) { outStream.write(buf, 0, sizeLimit - written); break; } when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. - Key: NUTCH-374 URL: http://issues.apache.org/jira/browse/NUTCH-374 Project: Nutch Issue Type: Bug Affects Versions: 0.8, 0.8.1 Reporter: King Kong I set http.content.limit to -1 to not truncate content being fetched. However , if response used gzip or x-gzip , then it was not able to uncompress. I found the problem is in HttpBase.processGzipEncoded (plugin lib-http) ... byte[] content = GZIPUtils.unzipBestEffort(compressed, getMaxContent()); ... because it is not deal with -1 to no limit , so must modify code to solve it; byte[] content; if (getMaxContent()=0){ content = GZIPUtils.unzipBestEffort(compressed, getMaxContent()); }else{ content = GZIPUtils.unzipBestEffort(compressed); } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing.
[ http://issues.apache.org/jira/browse/NUTCH-374?page=comments#action_12438855 ] King Kong commented on NUTCH-374: - Meghna Kukreja, It's right following your way, but I think it's best to modify code in the HttpBase. because http.content.limit is a config property for HTTP , and the GZIPUtils have overloaded a unzipBestEffort(byte[] in) method for no limit. public static final byte[] unzipBestEffort(byte[] in) { return unzipBestEffort(in, Integer.MAX_VALUE); } so, I suggest that we modify code in HttpBase. What do you think? when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing. - Key: NUTCH-374 URL: http://issues.apache.org/jira/browse/NUTCH-374 Project: Nutch Issue Type: Bug Affects Versions: 0.8, 0.8.1 Reporter: King Kong I set http.content.limit to -1 to not truncate content being fetched. However , if response used gzip or x-gzip , then it was not able to uncompress. I found the problem is in HttpBase.processGzipEncoded (plugin lib-http) ... byte[] content = GZIPUtils.unzipBestEffort(compressed, getMaxContent()); ... because it is not deal with -1 to no limit , so must modify code to solve it; byte[] content; if (getMaxContent()=0){ content = GZIPUtils.unzipBestEffort(compressed, getMaxContent()); }else{ content = GZIPUtils.unzipBestEffort(compressed); } -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira