Hi,

Hadoop neither read one line each time, nor fetching dfs.block.size of lines
into a buffer,
Actually, for the TextInputFormat, it read io.file.buffer.size bytes of text
into a buffer each time,
this can be seen from the hadoop source file LineReader.java



2011/10/5 Mark question <markq2...@gmail.com>

> Hello,
>
>  Correct me if I'm wrong, but when a program opens n-files at the same time
> to read from, and start reading from each file at a time 1 line at a time.
> Isn't hadoop actually fetching dfs.block.size of lines into a buffer? and
> not actually one line.
>
>  If this is correct, I set up my dfs.block.size = 3MB and each line takes
> about 650 bytes only, then I would assume the performance for reading
> 1-4000
> lines would be the same, but it isn't !  Do you know a way to find #n of
> lines to be read at once?
>
> Thank you,
> Mark
>

Reply via email to