[ 
https://issues.apache.org/jira/browse/NUTCH-2560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16508028#comment-16508028
 ] 

Sebastian Nagel commented on NUTCH-2560:
----------------------------------------

See [RFC 7230, section 3.2.4|https://tools.ietf.org/html/rfc7230#section-3.2.4]:
{quote}Historically, HTTP header field values could be extended over
   multiple lines by preceding each extra line with at least one space
   or horizontal tab (obs-fold).  This specification deprecates such
   line folding{quote}

Actually this seems to work if multi-line headers follow the spec (extra space 
at beginning of line), the unit test in [commit 
a2771dc|https://github.com/apache/nutch/pull/347/commits/a2771dc0d1f551b8dd1e07609ce978251a05f34a]
 passes if ported to Nutch 1.14.

> protocol-http throws an error when an http header spans over multiple lines
> ---------------------------------------------------------------------------
>
>                 Key: NUTCH-2560
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2560
>             Project: Nutch
>          Issue Type: Sub-task
>    Affects Versions: 1.14
>            Reporter: Gerard Bouchar
>            Priority: Major
>             Fix For: 1.15
>
>
> Some servers invalidly send headers that span over multiple lines. In that 
> case, browsers simply ignore the subsequent lines, but protocol-http throws 
> an error, thus preventing us from fetching the contents of the page.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to