[ https://issues.apache.org/jira/browse/HADOOP-15911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16679606#comment-16679606 ]
Steve Loughran commented on HADOOP-15911: ----------------------------------------- Patches should be supplied as a .patch file, then hit "patch-submit". Jenkins likes tests for this. w.r.t S3 download, whose library? S3A doesn't do byte buffers, AFAIK > Over-eager allocation in ByteBufferUtil.fallbackRead > ---------------------------------------------------- > > Key: HADOOP-15911 > URL: https://issues.apache.org/jira/browse/HADOOP-15911 > Project: Hadoop Common > Issue Type: Bug > Components: common > Reporter: Vanco Buca > Priority: Major > > The heap-memory path of ByteBufferUtil.fallbackRead ([see master branch code > here|[https://github.com/apache/hadoop/blob/a0da1ec01051108b77f86799dd5e97563b2a3962/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/ByteBufferUtil.java#L95])] > massively overallocates memory when the underlying input stream returns data > in smaller chunks. This happens on a regular basis when using the S3 input > stream as input. > The behavior is an O(N^2)-ish. In a recent debug session, we were trying to > read 6MB, but getting 16K at a time. The code would: > * allocate 16M, use the first 16K > * allocate 16M - 16K, use the first 16K of that > * allocate 16M - 32K, use the first 16K of that > * (etc) > The patch is simple. Here's the text version of the patch: > {code} > @@ -88,10 +88,17 @@ public final class ByteBufferUtil { > buffer.flip(); > } else { > buffer.clear(); > - int nRead = stream.read(buffer.array(), > - buffer.arrayOffset(), maxLength); > - if (nRead >= 0) { > - buffer.limit(nRead); > + int totalRead = 0; > + while (totalRead < maxLength) { > + final int nRead = stream.read(buffer.array(), > + buffer.arrayOffset() + totalRead, maxLength - totalRead); > + if (nRead <= 0) { > + break; > + } > + totalRead += nRead; > + } > + if (totalRead >= 0) { > + buffer.limit(totalRead); > success = true; > } > } > {code} > so, essentially, do the same thing that the code in the direct memory path is > doing -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org