[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream

[email protected] (Commented) (JIRA) Mon, 05 Mar 2012 13:58:47 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222641#comment-13222641
 ]


[email protected] commented on HDFS-2834:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4184/#review5622
-----------------------------------------------------------

Ship it!


first pass looks good. A few questions:
- do you have before/after benchmarks for the old TestParallelRead with this 
patch? ie make sure this patch doesn't regress performance of the "normal" 
array-based read path?
- have you run the various randomized tests overnight or so to get good 
coverage of all the cases?

I didn't look carefully at the libhdfs code, since you said offline there's 
another rev of that coming


hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12234>

    can you add javadoc explaining this var (even though it's not new)?



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12233>

    I'd move this inside the if statement below. Then in the second "phase", 
re-declare it under that scope, since the "len" variable doesn't ever carry 
over between phases (and in fact becomes invalid)



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12232>

    Seems like this area could be optimized a bit to avoid the object creation 
here...
    
    In one case, slowReadBuff.remaining() <= len, in which case you can 
directly do buf.put(slowReadBuff).
    
    In the other case, you could probably save off the limit of slowReadBuff, 
lower the limit to len, do the read, then set the limit again.



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12235>

    this code's very similar to line 358-362. Can we extract a utility function 
which we could optimize as I mentioned above to avoid object allocation?



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12236>

    I think it might simplify things if you used this function only for the 
case where verifyChecksum == true. Then above, the non-checksummed read case 
would just call fillBuffer directly. What do you think?
    
    My reasoning is that this function is called 'readChunksIntoByteBuffer' 
implying that it always reads a multiple of checksum chunk size, which isn't 
really a requirement when not verifying checksums



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
<https://reviews.apache.org/r/4184/#comment12237>

    maybe TRACE is more appropriate here. Also LOG.trace() inside



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
<https://reviews.apache.org/r/4184/#comment12238>

    remove commented out code



hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
<https://reviews.apache.org/r/4184/#comment12239>

    I think it's worth making these actual static inner classes with names 
instead of anonymous, for prettier stack traces.
    
    I remember we discussed a few weeks back whether the creation of these 
lambdas would harm performance for short reads. Do you have any results for 
TestParallelRead performance for the lambda-based approach here vs the uglier 
approach



hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c
<https://reviews.apache.org/r/4184/#comment12240>

    goofy empty clause. Other goofy indentation here.


- Todd


On 2012-03-05 21:00:55, Todd Lipcon wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4184/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-05 21:00:55)
bq.  
bq.  
bq.  Review request for hadoop-hdfs and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Posting patch to RB on behalf of Henry.
bq.  
bq.  
bq.  This addresses bug HDFS-2834.
bq.      http://issues.apache.org/jira/browse/HDFS-2834
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
 dfab730 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
 cc61697 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
 2b817ff 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
 b7da8d4 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
 ea24777 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
 9d4f4a2 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java
 PRE-CREATION 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
 bbd0012 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
 PRE-CREATION 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
 eb2a1d8 
bq.  
bq.  Diff: https://reviews.apache.org/r/4184/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Todd
bq.  
bq.


                
> ByteBuffer-based read API for DFSInputStream
> --------------------------------------------
>
>                 Key: HDFS-2834
>                 URL: https://issues.apache.org/jira/browse/HDFS-2834
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>         Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, 
> HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.patch, 
> HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png
>
>
> The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated 
> {{byte[]}}. Although for many clients this is desired behaviour, in certain 
> situations, such as native-reads through libhdfs, this imposes an extra copy 
> penalty since the {{byte[]}} needs to be copied out again into a natively 
> readable memory area. 
> For these cases, it would be preferable to allow the client to supply its own 
> buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream

Reply via email to