[ https://issues.apache.org/jira/browse/HDFS-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775611#comment-13775611 ]
Chris Nauroth commented on HDFS-5191: ------------------------------------- It looks like there are still some import-only changes in {{DFSUtil}}. {code} if (buffer == null) { throw new UnsupportedOperationException("zero-copy reads " + "were not available, and the ByteBufferPool did not provide " + "us with a " + (useDirect ? " direct" : "indirect") + "buffer."); } {code} Minor nit: this exception message would have an extraneous space in the direct case (i.e. "did not provide us with a direct"), and no space between the last two words (i.e. "did not provide us with a indirectbuffer"). {code} while (true) { countingVisitor.reset(); mmapManager.visitEvictable(countingVisitor); if (0 == countingVisitor.count) { break; } } {code} This test would cause an infinite loop if a bug was introduced that left mmaps lingering in evictable state, which could be hard to diagnose. Should we use {{GenericTestUtils#waitFor}}, so that there is a timeout? Lastly, can you help clarify something about the data structure behind {{IdentityHashStore}} for me? In particular, I'm wondering about the math in {{realloc}}: {code} private void realloc(int newCapacity) { Preconditions.checkArgument(newCapacity > 0); Object prevBuffer[] = buffer; this.capacity = newCapacity; int numElements = 1 + (2 * newCapacity); this.buffer = new Object[2 * numElements]; this.numInserted = 0; if (prevBuffer != null) { for (int i = 0; i < prevBuffer.length; i += 2) { if (prevBuffer[i] != null) { putInternal(prevBuffer[i], prevBuffer[i + 1]); } } } } {code} {{put}} will call {{realloc}} to double capacity when needed. {{numElements}} is double of the new capacity plus one extra. Then, the size is doubled again for allocating the new array. Therefore the growth pattern looks like: capacity == 2, buffer.length == 10 capacity == 4, buffer.length == 18 capacity == 8, buffer.length == 34 It looks like this causes a fair amount of extra unused slack in the array, because only 2 * {{numInserted}} elements are used at a time. Is the slack required, or should {{realloc}} only double the size once instead of twice? Also, I wasn't sure why we needed 1 extra in the calculation of {{numElements}}. Is that a defensive thing so that the loop in {{putInternal}} always terminates? I would expect the check of capacity in {{put}} to be sufficient to prevent that without needing an extra sentinel element. On to the libhdfs changes... > revisit zero-copy API in FSDataInputStream to make it more intuitive > -------------------------------------------------------------------- > > Key: HDFS-5191 > URL: https://issues.apache.org/jira/browse/HDFS-5191 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: hdfs-client, libhdfs > Affects Versions: HDFS-4949 > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Attachments: HDFS-5191-caching.001.patch, > HDFS-5191-caching.003.patch, HDFS-5191-caching.006.patch, > HDFS-5191-caching.007.patch, HDFS-5191-caching.008.patch > > > As per the discussion on HDFS-4953, we should revisit the zero-copy API to > make it more intuitive for new users. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira