Hi, Hadoop neither read one line each time, nor fetching dfs.block.size of lines into a buffer, Actually, for the TextInputFormat, it read io.file.buffer.size bytes of text into a buffer each time, this can be seen from the hadoop source file LineReader.java
2011/10/5 Mark question <markq2...@gmail.com> > Hello, > > Correct me if I'm wrong, but when a program opens n-files at the same time > to read from, and start reading from each file at a time 1 line at a time. > Isn't hadoop actually fetching dfs.block.size of lines into a buffer? and > not actually one line. > > If this is correct, I set up my dfs.block.size = 3MB and each line takes > about 650 bytes only, then I would assume the performance for reading > 1-4000 > lines would be the same, but it isn't ! Do you know a way to find #n of > lines to be read at once? > > Thank you, > Mark >