[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787862#action_12787862
 ] 

Todd Lipcon commented on HDFS-755:
----------------------------------

bq. What does this translate to for user cpu improvement (say with 32 byte 
buffer in DFSClient)? 

Here's a table of median times for 1G cat. The "before" row is 
CHUNKS_PER_READ=1 (the pre-HADOOP-3205 behavior) and the internal buffer size 
65536 (typical value). The "after" row is CHUNKS_PER_READ=32 and internal 
buffer 64 bytes.

|| ||User||Sys||Wall||
||Before|5.310|1.260|6.010|
||After|4.935|1.235|5.350|
||Improvement|7.06%|1.98%|10.98%|

The sys difference isn't significant according to a t-test. The user/wall are 
definitely significant (p < 2.2e-16). Changing around the internal buffer 
between 32, 50, 64 bytes didn't make any significant differences to any of the 
measurements.

bq. The bufferSize passed to FSInputChecker is essentially a hint

My question is whether people are actually treating it like that in practice. 
For example, SequenceFile.Reader doesn't create its own BufferedInputStream to 
wrap fs.open. It just passes the user-specified buffer size through. If our own 
code isn't wrapping these things with a buffer, should we expect that user code 
is?

> Read multiple checksum chunks at once in DFSInputStream
> -------------------------------------------------------
>
>                 Key: HDFS-755
>                 URL: https://issues.apache.org/jira/browse/HDFS-755
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: benchmark-8-256.png, benchmark.png, hdfs-755.txt, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to