[ https://issues.apache.org/jira/browse/NUTCH-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16488900#comment-16488900 ]
Omkar Reddy commented on NUTCH-2575: ------------------------------------ I have taken up [NUTCH-2557|https://issues.apache.org/jira/browse/NUTCH-2557] and started working on it. Thanks. > protocol-http does not respect the maximum content-size for chunked responses > ----------------------------------------------------------------------------- > > Key: NUTCH-2575 > URL: https://issues.apache.org/jira/browse/NUTCH-2575 > Project: Nutch > Issue Type: Sub-task > Components: protocol > Affects Versions: 1.14 > Reporter: Gerard Bouchar > Priority: Critical > Fix For: 1.15 > > > There is a bug in HttpResponse::readChunkedContent that prevents it to stop > reading content when it exceeds the maximum allowed size. > There [is a variable > contentBytesRead|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L404] > that is used to check how much content has been read, but it is never > updated, so it always stays null, and [the size > check|https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L440-L442] > always returns false (unless a single chunk is larger than the maximum > allowed content size). > This allows any server to cause out-of-memory errors on our size. -- This message was sent by Atlassian JIRA (v7.6.3#76005)