Gerard Bouchar created NUTCH-2564:
-------------------------------------
Summary: protocol-http throws an error when the content-length
header is not a number
Key: NUTCH-2564
URL: https://issues.apache.org/jira/browse/NUTCH-2564
Project: Nutch
Issue Type: Sub-task
Reporter: Gerard Bouchar
When a server sends an invalid Content-Length header (one that is not a valid
number) with a plain-text http body, browsers simply ignore it, but
protocol-http has a strange approach: if the header is composed only of white
spaces, it ignores it, but if it contains other characters, it throws an error,
preventing us from doing anything with the page.
If the HTTP body is chunked, protocol-http always ignores the Content-Length
header, be it invalid or not.
It should simply ignore invalid Content-Length headers.
Relevant code:
[https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L354-L359]
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)