[
https://issues.apache.org/jira/browse/HDFS-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13223097#comment-13223097
]
[email protected] commented on HDFS-2834:
-----------------------------------------------------
bq. On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq. > first pass looks good. A few questions:
bq. > - do you have before/after benchmarks for the old TestParallelRead with
this patch? ie make sure this patch doesn't regress performance of the "normal"
array-based read path?
bq. > - have you run the various randomized tests overnight or so to get good
coverage of all the cases?
bq. >
bq. > I didn't look carefully at the libhdfs code, since you said offline
there's another rev of that coming
Thanks for the review! I'll get a new patch on the ticket and here momentarily.
bq. On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq. >
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
line 123
bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line123>
bq. >
bq. > can you add javadoc explaining this var (even though it's not new)?
Done.
bq. On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq. >
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
line 341
bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line341>
bq. >
bq. > I'd move this inside the if statement below. Then in the second
"phase", re-declare it under that scope, since the "len" variable doesn't ever
carry over between phases (and in fact becomes invalid)
Good catch, done.
bq. On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq. >
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
lines 392-396
bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line392>
bq. >
bq. > this code's very similar to line 358-362. Can we extract a utility
function which we could optimize as I mentioned above to avoid object
allocation?
Done.
bq. On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq. >
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
line 358
bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line358>
bq. >
bq. > Seems like this area could be optimized a bit to avoid the object
creation here...
bq. >
bq. > In one case, slowReadBuff.remaining() <= len, in which case you can
directly do buf.put(slowReadBuff).
bq. >
bq. > In the other case, you could probably save off the limit of
slowReadBuff, lower the limit to len, do the read, then set the limit again.
Done - I didn't bother making the special case, but have avoided the slice. I
doubt the limit manipulation is costly.
bq. On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq. >
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java,
line 496
bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88332#file88332line496>
bq. >
bq. > remove commented out code
Done
bq. On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq. >
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
lines 412-413
bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line412>
bq. >
bq. > I think it might simplify things if you used this function only for
the case where verifyChecksum == true. Then above, the non-checksummed read
case would just call fillBuffer directly. What do you think?
bq. >
bq. > My reasoning is that this function is called
'readChunksIntoByteBuffer' implying that it always reads a multiple of checksum
chunk size, which isn't really a requirement when not verifying checksums
I take your point. My only reservation is that there's some logic that we'd
have to duplicate around the handling of offsetFromChunkBoundary and updating
the buffer position on a successful read. Maybe renaming the method is the
easiest way to avoid confusion? Maybe doByteBufferRead or something equally
vague...
bq. On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq. >
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java,
lines 539-540
bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88331#file88331line539>
bq. >
bq. > maybe TRACE is more appropriate here. Also LOG.trace() inside
Done on both counts.
bq. On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq. > hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c, line 826
bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88336#file88336line826>
bq. >
bq. > goofy empty clause. Other goofy indentation here.
Fixed. Whitespace is all over the shop in hdfs.c. I've made this method a
uniform 2-space indent.
bq. On 2012-03-05 21:57:31, Todd Lipcon wrote:
bq. >
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java,
line 591
bq. > <https://reviews.apache.org/r/4184/diff/1/?file=88332#file88332line591>
bq. >
bq. > I think it's worth making these actual static inner classes with
names instead of anonymous, for prettier stack traces.
bq. >
bq. > I remember we discussed a few weeks back whether the creation of
these lambdas would harm performance for short reads. Do you have any results
for TestParallelRead performance for the lambda-based approach here vs the
uglier approach
I've made them static. I'll measure the performance with and without, and
report on the jira.
- Henry
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4184/#review5622
-----------------------------------------------------------
On 2012-03-05 21:00:55, Todd Lipcon wrote:
bq.
bq. -----------------------------------------------------------
bq. This is an automatically generated e-mail. To reply, visit:
bq. https://reviews.apache.org/r/4184/
bq. -----------------------------------------------------------
bq.
bq. (Updated 2012-03-05 21:00:55)
bq.
bq.
bq. Review request for hadoop-hdfs and Todd Lipcon.
bq.
bq.
bq. Summary
bq. -------
bq.
bq. Posting patch to RB on behalf of Henry.
bq.
bq.
bq. This addresses bug HDFS-2834.
bq. http://issues.apache.org/jira/browse/HDFS-2834
bq.
bq.
bq. Diffs
bq. -----
bq.
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReader.java
dfab730
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/BlockReaderLocal.java
cc61697
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java
2b817ff
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader.java
b7da8d4
bq.
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/RemoteBlockReader2.java
ea24777
bq. hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.h 0ee29d5
bq. hadoop-hdfs-project/hadoop-hdfs/src/main/native/hdfs.c 7371caf
bq.
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/BlockReaderTestUtil.java
9d4f4a2
bq.
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelLocalRead.java
PRE-CREATION
bq.
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelRead.java
bbd0012
bq.
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestParallelReadUtil.java
PRE-CREATION
bq.
hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestShortCircuitLocalRead.java
eb2a1d8
bq.
bq. Diff: https://reviews.apache.org/r/4184/diff
bq.
bq.
bq. Testing
bq. -------
bq.
bq.
bq. Thanks,
bq.
bq. Todd
bq.
bq.
> ByteBuffer-based read API for DFSInputStream
> --------------------------------------------
>
> Key: HDFS-2834
> URL: https://issues.apache.org/jira/browse/HDFS-2834
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: Henry Robinson
> Assignee: Henry Robinson
> Attachments: HDFS-2834-no-common.patch, HDFS-2834.3.patch,
> HDFS-2834.4.patch, HDFS-2834.5.patch, HDFS-2834.6.patch, HDFS-2834.patch,
> HDFS-2834.patch, hdfs-2834-libhdfs-benchmark.png
>
>
> The {{DFSInputStream}} read-path always copies bytes into a JVM-allocated
> {{byte[]}}. Although for many clients this is desired behaviour, in certain
> situations, such as native-reads through libhdfs, this imposes an extra copy
> penalty since the {{byte[]}} needs to be copied out again into a natively
> readable memory area.
> For these cases, it would be preferable to allow the client to supply its own
> buffer, wrapped in a {{ByteBuffer}}, to avoid that final copy overhead.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira