I think below can give you more info about it.
http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/
Nice explanation by Owen here.

Regards,
Uma

----- Original Message -----
From: Yang Xiaoliang <yangxiaoliang2...@gmail.com>
Date: Wednesday, October 5, 2011 4:27 pm
Subject: Re: hadoop input buffer size
To: common-user@hadoop.apache.org

> Hi,
> 
> Hadoop neither read one line each time, nor fetching 
> dfs.block.size of lines
> into a buffer,
> Actually, for the TextInputFormat, it read io.file.buffer.size 
> bytes of text
> into a buffer each time,
> this can be seen from the hadoop source file LineReader.java
> 
> 
> 
> 2011/10/5 Mark question <markq2...@gmail.com>
> 
> > Hello,
> >
> >  Correct me if I'm wrong, but when a program opens n-files at 
> the same time
> > to read from, and start reading from each file at a time 1 line 
> at a time.
> > Isn't hadoop actually fetching dfs.block.size of lines into a 
> buffer? and
> > not actually one line.
> >
> >  If this is correct, I set up my dfs.block.size = 3MB and each 
> line takes
> > about 650 bytes only, then I would assume the performance for 
> reading> 1-4000
> > lines would be the same, but it isn't !  Do you know a way to 
> find #n of
> > lines to be read at once?
> >
> > Thank you,
> > Mark
> >
> 

Reply via email to