Thanks! Mark On Mon, Feb 7, 2011 at 8:28 PM, Ted Dunning <tdunn...@maprtech.com> wrote:
> That is quite doable. One way to do it is to make the max split size quite > small. > > On Mon, Feb 7, 2011 at 6:14 PM, Mark Kerzner <markkerz...@gmail.com> > wrote: > > > Ted, > > > > I am also interested in this answer. > > > > I put the name of a zip file on a line in an input file, and I want one > > mapper to read this line, and start working on it (since it now knows the > > path in HDFS). Are you saying it's not doable? > > > > Thank you, > > Mark > > > > On Mon, Feb 7, 2011 at 8:10 PM, Ted Dunning <tdunn...@maprtech.com> > wrote: > > > > > Option (1) isn't the way that things normally work. Besides, mappers > are > > > called many times for each construction of a mapper. > > > > > > On Mon, Feb 7, 2011 at 3:38 PM, maha <m...@umail.ucsb.edu> wrote: > > > > > > > Hi, > > > > > > > > I would appreciate it if you could give me your thoughts if there is > > > > affect on efficiency if: > > > > > > > > 1) Mappers were per line in a document > > > > > > > > or > > > > > > > > 2) Mappers were per block of lines in a document. > > > > > > > > > > > > I know the obvious difference I can see is that (1) has more > mappers. > > > Does > > > > that mean (1) will be slower because of scheduling time ? > > > > > > > > Thank you, > > > > Maha > > > > > > > > > >