I think below can give you more info about it. http://developer.yahoo.com/blogs/hadoop/posts/2009/08/the_anatomy_of_hadoop_io_pipel/ Nice explanation by Owen here.
Regards, Uma ----- Original Message ----- From: Yang Xiaoliang <yangxiaoliang2...@gmail.com> Date: Wednesday, October 5, 2011 4:27 pm Subject: Re: hadoop input buffer size To: common-user@hadoop.apache.org > Hi, > > Hadoop neither read one line each time, nor fetching > dfs.block.size of lines > into a buffer, > Actually, for the TextInputFormat, it read io.file.buffer.size > bytes of text > into a buffer each time, > this can be seen from the hadoop source file LineReader.java > > > > 2011/10/5 Mark question <markq2...@gmail.com> > > > Hello, > > > > Correct me if I'm wrong, but when a program opens n-files at > the same time > > to read from, and start reading from each file at a time 1 line > at a time. > > Isn't hadoop actually fetching dfs.block.size of lines into a > buffer? and > > not actually one line. > > > > If this is correct, I set up my dfs.block.size = 3MB and each > line takes > > about 650 bytes only, then I would assume the performance for > reading> 1-4000 > > lines would be the same, but it isn't ! Do you know a way to > find #n of > > lines to be read at once? > > > > Thank you, > > Mark > > >