[ https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223097#comment-13223097 ]
jirapos...@reviews.apache.org commented on HDFS-2834: ----------------------------------------------------- bq. On 2012-03-05 21:57:31, Todd Lipcon wrote: bq. > first pass looks good. A few questions: bq. > - do you have before/after benchmarks for the old TestParallelRead with this patch? ie make sure this patch doesn't regress performance of the "normal" array-based read path? bq. > - have you run the various randomized tests overnight or so to get good coverage of all the cases? bq. > bq. > I didn't look carefully at the libhdfs code, since you said offline there's another rev of that coming Thanks for the review! I'll get a new patch on the ticket and here momentarily. bq. On 2012-03-05 21:57:31, Todd Lipcon wrote: bq. > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 123 bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line123> bq. > bq. > can you add javadoc explaining this var (even though it's not new)? Done. bq. On 2012-03-05 21:57:31, Todd Lipcon wrote: bq. > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 341 bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line341> bq. > bq. > I'd move this inside the if statement below. Then in the second "phase", re-declare it under that scope, since the "len" variable doesn't ever carry over between phases (and in fact becomes invalid) Good catch, done. bq. On 2012-03-05 21:57:31, Todd Lipcon wrote: bq. > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 392-396 bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line392> bq. > bq. > this code's very similar to line 358-362. Can we extract a utility function which we could optimize as I mentioned above to avoid object allocation? Done. bq. On 2012-03-05 21:57:31, Todd Lipcon wrote: bq. > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, line 358 bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line358> bq. > bq. > Seems like this area could be optimized a bit to avoid the object creation here... bq. > bq. > In one case, slowReadBuff.remaining() <= len, in which case you can directly do buf.put(slowReadBuff). bq. > bq. > In the other case, you could probably save off the limit of slowReadBuff, lower the limit to len, do the read, then set the limit again. Done - I didn't bother making the special case, but have avoided the slice. I doubt the limit manipulation is costly. bq. On 2012-03-05 21:57:31, Todd Lipcon wrote: bq. > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java, line 496 bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88332#file88332line496> bq. > bq. > remove commented out code Done bq. On 2012-03-05 21:57:31, Todd Lipcon wrote: bq. > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 412-413 bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line412> bq. > bq. > I think it might simplify things if you used this function only for the case where verifyChecksum == true. Then above, the non-checksummed read case would just call fillBuffer directly. What do you think? bq. > bq. > My reasoning is that this function is called 'readChunksIntoByteBuffer' implying that it always reads a multiple of checksum chunk size, which isn't really a requirement when not verifying checksums I take your point. My only reservation is that there's some logic that we'd have to duplicate around the handling of offsetFromChunkBoundary and updating the buffer position on a successful read. Maybe renaming the method is the easiest way to avoid confusion? Maybe doByteBufferRead or something equally vague... bq. On 2012-03-05 21:57:31, Todd Lipcon wrote: bq. > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java, lines 539-540 bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line539> bq. > bq. > maybe TRACE is more appropriate here. Also LOG.trace() inside Done on both counts. bq. On 2012-03-05 21:57:31, Todd Lipcon wrote: bq. > hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c, line 826 bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88336#file88336line826> bq. > bq. > goofy empty clause. Other goofy indentation here. Fixed. Whitespace is all over the shop in hdfs.c. I've made this method a uniform 2-space indent. bq. On 2012-03-05 21:57:31, Todd Lipcon wrote: bq. > hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java, line 591 bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88332#file88332line591> bq. > bq. > I think it's worth making these actual static inner classes with names instead of anonymous, for prettier stack traces. bq. > bq. > I remember we discussed a few weeks back whether the creation of these lambdas would harm performance for short reads. Do you have any results for TestParallelRead performance for the lambda-based approach here vs the uglier approach I've made them static. I'll measure the performance with and without, and report on the jira. - Henry ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4184/#review5622 ----------------------------------------------------------- On 2012-03-05 21:00:55, Todd Lipcon wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/4184/ bq. ----------------------------------------------------------- bq. bq. (Updated 2012-03-05 21:00:55) bq. bq. bq. Review request for hadoop-hdfs and Todd Lipcon. bq. bq. bq. Summary bq. ------- bq. bq. Posting patch to RB on behalf of Henry. bq. bq. bq. This addresses bug HDFS-2834. bq. http://issues.apache.org/jira/browse/HDFS-2834 bq. bq. bq. Diffs bq. ----- bq. bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java dfab730 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java cc61697 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java 2b817ff bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java b7da8d4 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java ea24777 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5 bq. hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java 9d4f4a2 bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java PRE-CREATION bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java bbd0012 bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java PRE-CREATION bq. hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java eb2a1d8 bq. bq. Diff: https://reviews.apache.org/r/4184/diff bq. bq. bq. Testing bq. ------- bq. bq. bq. Thanks, bq. bq. Todd bq. bq. > ByteBuffer-based read API for DFSInputStream > -------------------------------------------- > > Key: HDFS-2834 > URL: https://issues.apache.org/jira/browse/HDFS-2834 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Henry Robinson > Assignee: Henry Robinson > Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch, > HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.patch, > HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png > > > The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated > {{byte[]}}. Although for many clients this is desired behaviour, in certain > situations, such as native-reads through libhdfs, this imposes an extra copy > penalty since the {{byte[]}} needs to be copied out again into a natively > readable memory area. > For these cases, it would be preferable to allow the client to supply its own > buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira