[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-06-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028007#comment-14028007
 ] 

Hudson commented on NUTCH-1736:
---

SUCCESS: Integrated in Nutch-trunk #2652 (See 
[https://builds.apache.org/job/Nutch-trunk/2652/])
NUTCH-1736 Can't fetch page if http response header contains 
Transfer-Encoding:chunked (jnioche: 
http://svn.apache.org/viewvc/nutch/trunk/?view=rev&rev=1601935)
* /nutch/trunk/CHANGES.txt
* /nutch/trunk/src/java/org/apache/nutch/metadata/HttpHeaders.java
* 
/nutch/trunk/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java


> Can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 2.3, 1.8
>Reporter: ysc
>Priority: Critical
> Fix For: 2.4, 1.9
>
> Attachments: nutch1.7.patch, nutch2.2.1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-06-11 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027994#comment-14027994
 ] 

Hudson commented on NUTCH-1736:
---

FAILURE: Integrated in Nutch-nutchgora #1039 (See 
[https://builds.apache.org/job/Nutch-nutchgora/1039/])
NUTCH-1736 Can't fetch page if http response header contains 
Transfer-Encoding:chunked (jnioche: 
http://svn.apache.org/viewvc/nutch/branches/2.x/?view=rev&rev=1601937)
* /nutch/branches/2.x/CHANGES.txt
* /nutch/branches/2.x/src/java/org/apache/nutch/metadata/HttpHeaders.java
* 
/nutch/branches/2.x/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java


> Can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 2.3, 1.8
>Reporter: ysc
>Priority: Critical
> Fix For: 2.4, 1.9
>
> Attachments: nutch1.7.patch, nutch2.2.1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-06-11 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027945#comment-14027945
 ] 

Julien Nioche commented on NUTCH-1736:
--

2.x Committed revision 1601937.


> Can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 2.3, 1.8
>Reporter: ysc
>Priority: Critical
> Fix For: 2.4, 1.9
>
> Attachments: nutch1.7.patch, nutch2.2.1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-06-11 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14027922#comment-14027922
 ] 

Julien Nioche commented on NUTCH-1736:
--

Trunk :Committed revision 1601935.

> Can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 2.3, 1.8
>Reporter: ysc
>Priority: Critical
> Fix For: 2.4, 1.9
>
> Attachments: nutch1.7.patch, nutch2.2.1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-03-27 Thread Julien Nioche (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13949513#comment-13949513
 ] 

Julien Nioche commented on NUTCH-1736:
--

Looks good and seems to have fixed the issue I was encountering on 
"http://www.imones.lt/v-amonto-individuali-veikla-filialas-832833";. Needs to be 
tested a bit more before committing. Thanks for this contribution

> Can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1
>Reporter: ysc
>Priority: Critical
> Fix For: 2.3, 1.9
>
> Attachments: nutch1.7.patch, nutch2.2.1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-03-16 Thread ysc (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937438#comment-13937438
 ] 

ysc commented on NUTCH-1736:


Hi lufeng, thanks, this is a good idea, i have modified the patch files.

> Can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1
>Reporter: ysc
>Priority: Critical
> Fix For: 2.3, 1.9
>
> Attachments: nutch1.7.patch, nutch2.2.1.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-03-16 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937426#comment-13937426
 ] 

lufeng commented on NUTCH-1736:
---

Hi ysc

you can check the content size to fix this issue like this. 

{code:java}
if (http.getMaxContent() >= 0 && (contentBytesRead + chunkLen) > 
http.getMaxContent() )
  chunkLen= http.getMaxContent() - contentBytesRead;
{code}

> Can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1
>Reporter: ysc
>Priority: Critical
> Fix For: 2.3, 1.9
>
> Attachments: nutch-2.2.1.patch, nutch1.7.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-03-16 Thread ysc (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937425#comment-13937425
 ] 

ysc commented on NUTCH-1736:


Hi Sebastian, I have modified the previous comment and added some explain.

> Can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1
>Reporter: ysc
>Priority: Critical
> Fix For: 2.3, 1.9
>
> Attachments: nutch-2.2.1.patch, nutch1.7.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-03-16 Thread lufeng (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937418#comment-13937418
 ] 

lufeng commented on NUTCH-1736:
---

Hi Sebastian, I think this patch is not related to NUTCH-1647, maybe they have 
same exception error result. NUTCH-1647 is about url redirection issue. 





> Can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1
>Reporter: ysc
>Priority: Critical
> Fix For: 2.3, 1.9
>
> Attachments: nutch-2.2.1.patch, nutch1.7.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) Can't fetch page if http response header contains Transfer-Encoding:chunked

2014-03-16 Thread Sebastian Nagel (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13937269#comment-13937269
 ] 

Sebastian Nagel commented on NUTCH-1736:


Thanks, [~yangshangchuan] for taking the time to analyze the problem. The patch 
also fixes NUTCH-1647.

Any ideas, why {{http.content.limit}} must not be -1?

> Can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1
>Reporter: ysc
>Priority: Critical
> Fix For: 2.3, 1.9
>
> Attachments: nutch-2.2.1.patch, nutch1.7.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) can't fetch page if http response header contains Transfer-Encoding:chunked

2014-03-15 Thread ysc (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936103#comment-13936103
 ] 

ysc commented on NUTCH-1736:


problem:
 
fetching: 
http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
unzipBestEffort returned null

detail:

2014-03-12 16:48:38,031 ERROR http.Http - Failed to get protocol output
java.io.IOException: unzipBestEffort returned null
at 
org.apache.nutch.protocol.http.api.HttpBase.processGzipEncoded(HttpBase.java:317)
at org.apache.nutch.protocol.http.HttpResponse.(HttpResponse.java:164)
at org.apache.nutch.protocol.http.Http.getResponse(Http.java:64)
at 
org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:140)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:703)
2014-03-12 16:48:38,031 INFO  fetcher.Fetcher - fetch of 
http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html 
failed with: java.io.IOException: unzipBestEffort returned null
2014-03-12 16:48:38,031 INFO  fetcher.Fetcher - -finishing thread 
FetcherThread, activeThreads=0

solution:

this patch deal with http response header Transfer-Encoding:chunked

important tips: 

property http.content.limit in nutch-site.xml must greater than 0

> can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1
>Reporter: ysc
>Priority: Critical
> Attachments: nutch-2.2.1.patch, nutch1.7.patch
>
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (NUTCH-1736) can't fetch page if http response header contains Transfer-Encoding:chunked

2014-03-15 Thread ysc (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13936063#comment-13936063
 ] 

ysc commented on NUTCH-1736:


for nutch1.x can use the patch nutch1.7.patch
for nutch2.x can use the patch nutch-2.2.1.patch


> can't fetch page if http response header contains Transfer-Encoding:chunked
> ---
>
> Key: NUTCH-1736
> URL: https://issues.apache.org/jira/browse/NUTCH-1736
> Project: Nutch
>  Issue Type: Bug
>  Components: protocol
>Affects Versions: 1.6, 2.1, 1.7, 2.2, 2.3, 1.8, 2.4, 1.9, 2.2.1
>Reporter: ysc
>Priority: Critical
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> fetching: 
> http://szs.mof.gov.cn/zhengwuxinxi/zhengcefabu/201402/t20140224_1046354.html
> Fetch failed with protocol status: EXCEPTION: java.io.IOException: 
> unzipBestEffort returned null



--
This message was sent by Atlassian JIRA
(v6.2#6252)