[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794943#action_12794943
 ] 

Raghu Angadi commented on HDFS-755:
-----------------------------------

There is always a buffer of 512 bytes (checksum chunk size). So the worst case 
is 512 byte reads. If 512 is not large enough, we can decide on some size like 
4k. This way large readers benefit from reduced copy and small readers pay a 
small penalty (1 syscall per 4k).

The misalignment can occur even after the first packet. Another option is to 
have two buffers which which are read alternatively for crc and data (each time 
checking if other buffer has available data).

>  So, I don't think we should do optimizatinos that would destroy performance 
> of this scenario.

true. at the same time this is an optimization jira.

I didn't get around to reproducing cpu improvement. I ran the commands you gave 
(in email). will try again today.

I have already gave a +1 for the patch. We should just note that it needs more 
work to actually make use of HADOOP-3205.

> Read multiple checksum chunks at once in DFSInputStream
> -------------------------------------------------------
>
>                 Key: HDFS-755
>                 URL: https://issues.apache.org/jira/browse/HDFS-755
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to