[ 
https://issues.apache.org/jira/browse/HDFS-755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12793113#action_12793113
 ] 

Raghu Angadi commented on HDFS-755:
-----------------------------------

User code should use buffering for application specific reasons. May be 
'bufferSize' argument for FSInputStream is flawed to start with.

My impression is that main purpose of this patch is to reduce a copy. keeping 
the large buffer prohibits that.

Even when a sequencefile has very small records (avg < 1k?), I think it might 
not have net negative effect. system calls are fairly cheap. There might not be 
a net negative effect on fairly small reads.

Do you see FSInputChecker or DFSClient evolve to dynamically decide if a buffer 
should be used in near future?

+1 for the patch itself.

btw, I ran 'time bin/hadoop fs -cat 1gbfile > /dev/null', with NN, DN, and the 
client on the same machine, but not been able to see improvement. will verify 
if I am really running the patch. 


> Read multiple checksum chunks at once in DFSInputStream
> -------------------------------------------------------
>
>                 Key: HDFS-755
>                 URL: https://issues.apache.org/jira/browse/HDFS-755
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: alldata-hdfs.tsv, benchmark-8-256.png, benchmark.png, 
> hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt, hdfs-755.txt
>
>
> HADOOP-3205 adds the ability for FSInputChecker subclasses to read multiple 
> checksum chunks in a single call to readChunk. This is the HDFS-side use of 
> that new feature.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to