[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream

[email protected] (Commented) (JIRA) Tue, 06 Mar 2012 00:49:28 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223097#comment-13223097
 ]

[email protected] commented on HDFS-2834:
-----------------------------------------------------

bq.  On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq.  > first pass looks good. A few questions:
bq.  > - do you have before/after benchmarks for the old TestParallelRead with 
this patch? ie make sure this patch doesn't regress performance of the "normal" 
array-based read path?
bq.  > - have you run the various randomized tests overnight or so to get good 
coverage of all the cases?
bq.  > 
bq.  > I didn't look carefully at the libhdfs code, since you said offline 
there's another rev of that coming

Thanks for the review! I'll get a new patch on the ticket and here momentarily. 

bq.  On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 line 123
bq.  > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line123>
bq.  >
bq.  >     can you add javadoc explaining this var (even though it's not new)?

Done.

bq.  On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 line 341
bq.  > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line341>
bq.  >
bq.  >     I'd move this inside the if statement below. Then in the second 
"phase", re-declare it under that scope, since the "len" variable doesn't ever 
carry over between phases (and in fact becomes invalid)

Good catch, done. 

bq.  On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 lines 392-396
bq.  > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line392>
bq.  >
bq.  >     this code's very similar to line 358-362. Can we extract a utility 
function which we could optimize as I mentioned above to avoid object 
allocation?

Done.

bq.  On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 line 358
bq.  > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line358>
bq.  >
bq.  >     Seems like this area could be optimized a bit to avoid the object 
creation here...
bq.  >     
bq.  >     In one case, slowReadBuff.remaining() <= len, in which case you can 
directly do buf.put(slowReadBuff).
bq.  >     
bq.  >     In the other case, you could probably save off the limit of 
slowReadBuff, lower the limit to len, do the read, then set the limit again.

Done - I didn't bother making the special case, but have avoided the slice. I 
doubt the limit manipulation is costly.

bq.  On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java,
 line 496
bq.  > <https://reviews.apache.org/r/4184/diff/1/?file=88332#file88332line496>
bq.  >
bq.  >     remove commented out code

Done

bq.  On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 lines 412-413
bq.  > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line412>
bq.  >
bq.  >     I think it might simplify things if you used this function only for 
the case where verifyChecksum == true. Then above, the non-checksummed read 
case would just call fillBuffer directly. What do you think?
bq.  >     
bq.  >     My reasoning is that this function is called 
'readChunksIntoByteBuffer' implying that it always reads a multiple of checksum 
chunk size, which isn't really a requirement when not verifying checksums

I take your point. My only reservation is that there's some logic that we'd 
have to duplicate around the handling of offsetFromChunkBoundary and updating 
the buffer position on a successful read. Maybe renaming the method is the 
easiest way to avoid confusion? Maybe doByteBufferRead or something equally 
vague...

bq.  On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
 lines 539-540
bq.  > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line539>
bq.  >
bq.  >     maybe TRACE is more appropriate here. Also LOG.trace() inside

Done on both counts.

bq.  On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq.  > hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c, line 826
bq.  > <https://reviews.apache.org/r/4184/diff/1/?file=88336#file88336line826>
bq.  >
bq.  >     goofy empty clause. Other goofy indentation here.

Fixed. Whitespace is all over the shop in hdfs.c. I've made this method a 
uniform 2-space indent.

bq.  On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq.  > 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java,
 line 591
bq.  > <https://reviews.apache.org/r/4184/diff/1/?file=88332#file88332line591>
bq.  >
bq.  >     I think it's worth making these actual static inner classes with 
names instead of anonymous, for prettier stack traces.
bq.  >     
bq.  >     I remember we discussed a few weeks back whether the creation of 
these lambdas would harm performance for short reads. Do you have any results 
for TestParallelRead performance for the lambda-based approach here vs the 
uglier approach

I've made them static. I'll measure the performance with and without, and 
report on the jira. 

- Henry

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4184/#review5622
-----------------------------------------------------------

On 2012-03-05 21:00:55, Todd Lipcon wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/4184/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2012-03-05 21:00:55)
bq.  
bq.  
bq.  Review request for hadoop-hdfs and Todd Lipcon.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Posting patch to RB on behalf of Henry.
bq.  
bq.  
bq.  This addresses bug HDFS-2834.
bq.      http://issues.apache.org/jira/browse/HDFS-2834
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
 dfab730 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
 cc61697 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
 2b817ff 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
 b7da8d4 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
 ea24777 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 
bq.    hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
 9d4f4a2 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java
 PRE-CREATION 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
 bbd0012 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
 PRE-CREATION 
bq.    
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
 eb2a1d8 
bq.  
bq.  Diff: https://reviews.apache.org/r/4184/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Todd
bq.  
bq.

> ByteBuffer-based read API for DFSInputStream
> --------------------------------------------
>
>                 Key: HDFS-2834
>                 URL: https://issues.apache.org/jira/browse/HDFS-2834
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Henry Robinson
>            Assignee: Henry Robinson
>         Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, 
> HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.patch, 
> HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png
>
>
> The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated 
> {{byte[]}}. Although for many clients this is desired behaviour, in certain 
> situations, such as native-reads through libhdfs, this imposes an extra copy 
> penalty since the {{byte[]}} needs to be copied out again into a natively 
> readable memory area. 
> For these cases, it would be preferable to allow the client to supply its own 
> buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HDFS-2834) ByteBuffer-based read API for DFSInputStream

Reply via email to