Mark, NLineInputFormat was not something which was introduced in 0.21, I have just sent the reference to the 0.21 url FYI. It's in 0.20.205, 1.0.0 and 0.23 releases also.
Praveen On Fri, Feb 3, 2012 at 1:25 AM, Mark Kerzner <mark.kerz...@shmsoft.com>wrote: > Praveen, > > this seems just like the right thing, but it's API 0.21 (I googled about > the problems with it), so I have to use either the next Cloudera release, > or Hortonworks, or something, am I right? > > Mark > > On Thu, Feb 2, 2012 at 7:39 AM, Praveen Sripati <praveensrip...@gmail.com > >wrote: > > > > I have a simple MR job, and I want each Mapper to get one line from my > > input file (which contains further instructions for lengthy processing). > > > > Use the NLineInputFormat class. > > > > > > > http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/org/apache/hadoop/mapreduce/lib/input/NLineInputFormat.html > > > > Praveen > > > > On Thu, Feb 2, 2012 at 9:43 AM, Mark Kerzner <mark.kerz...@shmsoft.com > > >wrote: > > > > > Thanks! > > > Mark > > > > > > On Wed, Feb 1, 2012 at 7:44 PM, Anil Gupta <anilgupt...@gmail.com> > > wrote: > > > > > > > Yes, if ur block size is 64mb. Btw, block size is configurable in > > Hadoop. > > > > > > > > Best Regards, > > > > Anil > > > > > > > > On Feb 1, 2012, at 5:06 PM, Mark Kerzner <mark.kerz...@shmsoft.com> > > > wrote: > > > > > > > > > Anil, > > > > > > > > > > do you mean one block of HDFS, like 64MB? > > > > > > > > > > Mark > > > > > > > > > > On Wed, Feb 1, 2012 at 7:03 PM, Anil Gupta <anilgupt...@gmail.com> > > > > wrote: > > > > > > > > > >> Do u have enough data to start more than one mapper? > > > > >> If entire data is less than a block size then only 1 mapper will > > run. > > > > >> > > > > >> Best Regards, > > > > >> Anil > > > > >> > > > > >> On Feb 1, 2012, at 4:21 PM, Mark Kerzner < > mark.kerz...@shmsoft.com> > > > > wrote: > > > > >> > > > > >>> Hi, > > > > >>> > > > > >>> I have a simple MR job, and I want each Mapper to get one line > from > > > my > > > > >>> input file (which contains further instructions for lengthy > > > > processing). > > > > >>> Each line is 100 characters long, and I tell Hadoop to read only > > 100 > > > > >> bytes, > > > > >>> > > > > >>> > > > > >> > > > > > > > > > > job.getConfiguration().setInt("mapreduce.input.linerecordreader.line.maxlength", > > > > >>> 100); > > > > >>> > > > > >>> I see that this part works - it reads only one line at a time, > and > > > if I > > > > >>> change this parameter, it listens. > > > > >>> > > > > >>> However, on a cluster only one node receives all the map tasks. > > Only > > > > one > > > > >>> map tasks is started. The others never get anything, they just > > wait. > > > > I've > > > > >>> added 100 seconds wait to the mapper - no change! > > > > >>> > > > > >>> Any advice? > > > > >>> > > > > >>> Thank you. Sincerely, > > > > >>> Mark > > > > >> > > > > > > > > > >