[ 
https://issues.apache.org/jira/browse/HDFS-5191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13775611#comment-13775611
 ] 

Chris Nauroth commented on HDFS-5191:
-------------------------------------

It looks like there are still some import-only changes in {{DFSUtil}}.

{code}
    if (buffer == null) {
      throw new UnsupportedOperationException("zero-copy reads " +
          "were not available, and the ByteBufferPool did not provide " +
          "us with a " + (useDirect ? " direct" : "indirect") +
          "buffer.");
    }
{code}

Minor nit: this exception message would have an extraneous space in the direct 
case (i.e. "did not provide us with a  direct"), and no space between the last 
two words (i.e. "did not provide us with a indirectbuffer").

{code}
      while (true) {
        countingVisitor.reset();
        mmapManager.visitEvictable(countingVisitor);
        if (0 == countingVisitor.count) {
          break;
        }
      }
{code}

This test would cause an infinite loop if a bug was introduced that left mmaps 
lingering in evictable state, which could be hard to diagnose.  Should we use 
{{GenericTestUtils#waitFor}}, so that there is a timeout?

Lastly, can you help clarify something about the data structure behind 
{{IdentityHashStore}} for me?  In particular, I'm wondering about the math in 
{{realloc}}:

{code}
  private void realloc(int newCapacity) {
    Preconditions.checkArgument(newCapacity > 0);
    Object prevBuffer[] = buffer;
    this.capacity = newCapacity;
    int numElements = 1 + (2 * newCapacity);
    this.buffer = new Object[2 * numElements];
    this.numInserted = 0;
    if (prevBuffer != null) {
      for (int i = 0; i < prevBuffer.length; i += 2) {
        if (prevBuffer[i] != null) {
          putInternal(prevBuffer[i], prevBuffer[i + 1]);
        }
      }
    }
  }
{code}

{{put}} will call {{realloc}} to double capacity when needed.  {{numElements}} 
is double of the new capacity plus one extra.  Then, the size is doubled again 
for allocating the new array.  Therefore the growth pattern looks like:

capacity == 2, buffer.length == 10
capacity == 4, buffer.length == 18
capacity == 8, buffer.length == 34

It looks like this causes a fair amount of extra unused slack in the array, 
because only 2 * {{numInserted}} elements are used at a time.  Is the slack 
required, or should {{realloc}} only double the size once instead of twice?  
Also, I wasn't sure why we needed 1 extra in the calculation of 
{{numElements}}.  Is that a defensive thing so that the loop in {{putInternal}} 
always terminates?  I would expect the check of capacity in {{put}} to be 
sufficient to prevent that without needing an extra sentinel element.

On to the libhdfs changes...

                
> revisit zero-copy API in FSDataInputStream to make it more intuitive
> --------------------------------------------------------------------
>
>                 Key: HDFS-5191
>                 URL: https://issues.apache.org/jira/browse/HDFS-5191
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client, libhdfs
>    Affects Versions: HDFS-4949
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-5191-caching.001.patch, 
> HDFS-5191-caching.003.patch, HDFS-5191-caching.006.patch, 
> HDFS-5191-caching.007.patch, HDFS-5191-caching.008.patch
>
>
> As per the discussion on HDFS-4953, we should revisit the zero-copy API to 
> make it more intuitive for new users.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to