[ 
https://issues.apache.org/jira/browse/NUTCH-2557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16509871#comment-16509871
 ] 

Hudson commented on NUTCH-2557:
-------------------------------

SUCCESS: Integrated in Jenkins build Nutch-trunk #3534 (See 
[https://builds.apache.org/job/Nutch-trunk/3534/])
NUTCH-2557 protocol-http fails to follow redirections when HTTP response 
(snagel: 
[https://github.com/apache/nutch/commit/d163512d5d2e345dfe6c816a29dc93a108dfd254])
* (edit) 
src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java
* (edit) 
src/plugin/protocol-http/src/test/org/apache/nutch/protocol/http/TestBadServerResponses.java


> protocol-http fails to follow redirections when an HTTP response body is 
> invalid
> --------------------------------------------------------------------------------
>
>                 Key: NUTCH-2557
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2557
>             Project: Nutch
>          Issue Type: Sub-task
>    Affects Versions: 1.14
>            Reporter: Gerard Bouchar
>            Priority: Major
>             Fix For: 1.15
>
>
> If a server sends a redirection (3XX status code, with a Location header), 
> protocol-http tries to parse the HTTP response body anyway. Thus, if an error 
> occurs while decoding the body, the redirection is not followed and the 
> information is lost. Browsers follow the redirection and close the socket 
> soon as they can.
>  * Example: this page is a redirection to its https version, with an HTTP 
> body containing invalidly gzip encoded contents. Browsers follow the 
> redirection, but nutch throws an error:
>  ** [http://www.webarcelona.net/es/blog?page=2]
>  
> The HttpResponse::getContent class can already return null. I think it should 
> at least return null when parsing the HTTP response body fails.
> Ideally, we would adopt the same behavior as browsers, and not even try 
> parsing the body when the headers indicate a redirection.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to