On Tue, May 27, 2008 at 10:49:38AM -0700, Ted Dunning wrote:
> 
> There is a good tutorial on the wiki about this.
> 
> Your problem here is that you have conflated two concepts.  The first is the
> splitting of files into blocks for storage purposes.  This has nothing to do
> with what data a program can read any more than splitting a file into blocks
> on a disk in a conventional file system limits what you can read.  The
> second splitting concept is that the input format does in order to allow
> parallelism.  Basically, the file block splits have nothing to do with what
> data the mapper can read.  It only has to do with what data will be local.
> 

When reading from HDFS, how big are the network read requests, and what
controls that? Or, more concretely, if I store files using 64Meg blocks
in HDFS and run the simple word count example, and I get the default of
one FileSplit/Map task per 64 meg block, how many bytes into the second 64meg
block will a mapper read before it first passes a buffer up to the record
reader to see if it has found an end-of-line?

Thanks,

-Erik

Reply via email to