MultiFileSplit currently does not support automatic map task count computation. You can manually set the number of maps via jobConf#setNumMapTasks() or via command line arg -D mapred.map.tasks=<number>

Goel, Ankur wrote:
Hi Folks,
              I am using hadoop to process some temporal data which is
split in lot of small files (~ 3 - 4 MB)
Using TextInputFormat resulted in too many mappers (1 per file) creating
a lot of overhead so I switched to
MultiFileInputFormat - (MutiFileWordCount.MyInputFormat) which resulted
in just 1 mapper.
I was hoping to set the no of mappers to 1 so that hadoop automatically
takes care of generating the right
number of map tasks.
Looks like when using MultiFileInputFormat one has to rely on the
application to specify the right number of mappers
or am I missing something ? Please advise.
Thanks
-Ankur

Reply via email to