Thanks for the explanation HJ - I always meant to look into that bit of code to work out how it did it.
Tim On Wed, Sep 19, 2012 at 6:24 PM, Harsh J <ha...@cloudera.com> wrote: > Hi Tim, > > Splits don't look at newlines in the TextInputFormat at least. So > since the computed splits > default map numbers, I think a perfect > file of 10 blocks will spawn only 10 mappers. The mapper's record > reader is the one that reads until a newline (even after the end of > its block length bytes). > > On Wed, Sep 19, 2012 at 9:16 PM, Tim Robertson > <timrobertson...@gmail.com> wrote: > > I think the splitting recognises the end of line, so you might get 11 but > > otherwise that looks correct. > > > > > > > > On Wed, Sep 19, 2012 at 5:42 PM, Pedro Sá da Costa <psdc1...@gmail.com> > > wrote: > >> > >> > >> > >> If I've an input file of 640MB in size, and a split size of 64Mb, this > >> file will be partitioned in 10 splits, and each split will be processed > by a > >> map task, right? > >> > >> -- > >> Best regards, > >> > > > > > > -- > Harsh J >