[ https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787862#action_12787862 ]
Todd Lipcon commented on HDFS-755: ---------------------------------- bq. What does this translate to for user cpu improvement (say with 32 byte buffer in DFSClient)? Here's a table of median times for 1G cat. The "before" row is CHUNKS_PER_READ=1 (the pre-HADOOP-3205 behavior) and the internal buffer size 65536 (typical value). The "after" row is CHUNKS_PER_READ=32 and internal buffer 64 bytes. || ||User||Sys||Wall|| ||Before|5.310|1.260|6.010| ||After|4.935|1.235|5.350| ||Improvement|7.06%|1.98%|10.98%| The sys difference isn't significant according to a t-test. The user/wall are definitely significant (p < 2.2e-16). Changing around the internal buffer between 32, 50, 64 bytes didn't make any significant differences to any of the measurements. bq. The bufferSize passed to FSInputChecker is essentially a hint My question is whether people are actually treating it like that in practice. For example, SequenceFile.Reader doesn't create its own BufferedInputStream to wrap fs.open. It just passes the user-specified buffer size through. If our own code isn't wrapping these things with a buffer, should we expect that user code is? > Read multiple checksum chunks at once in DFSInputStream > ------------------------------------------------------- > > Key: HDFS-755 > URL: https://issues.apache.org/jira/browse/HDFS-755 > Project: Hadoop HDFS > Issue Type: Improvement > Components: hdfs client > Affects Versions: 0.22.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: benchmark-8-256.png, benchmark.png, hdfs-755.txt, > hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt > > > HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple > checksum chunks in a single call to readChunk. This is the HDFS-side use of > that new feature. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.