Re: MultiFileInputFormat - Not enough mappers

Enis Soztutar Fri, 11 Jul 2008 07:29:30 -0700

Yes, please open a jira for this. We should ensure thatavgLengthPerSplit in MultiFileInputFormat should not exceed default fileblock size. However unlike FileInputFormat, all the files will come froma different block.


Goel, Ankur wrote:

In this case I have to compute the number of map tasks in the
application - (totalSize / blockSize), which is what I am doing as a
work-around.
I think this should be the default behaviour in MultiFileInputFormat.
Should a JIRA be opened for the same ?

-Ankur


-----Original Message-----
From: Enis Soztutar [mailto:[EMAIL PROTECTED]Sent: Friday, July 11, 2008 7:21 PM
To: core-user@hadoop.apache.org
Subject: Re: MultiFileInputFormat - Not enough mappers

MultiFileSplit currently does not support automatic map task count
computation. You can manually set the number of maps via
jobConf#setNumMapTasks() or via command line arg -D
mapred.map.tasks=<number>


Goel, Ankur wrote:
Hi Folks,
              I am using hadoop to process some temporal data which is
split in lot of small files (~ 3 - 4 MB) Using TextInputFormatresulted in too many mappers (1 per file) creating a lot of overheadso I switched to MultiFileInputFormat -(MutiFileWordCount.MyInputFormat) which resulted in just 1 mapper.I was hoping to set the no of mappers to 1 so that hadoopautomatically takes care of generating the right number of map tasks.Looks like when using MultiFileInputFormat one has to rely on theapplication to specify the right number of mappers or am I missingsomething ? Please advise.Thanks
-Ankur

Re: MultiFileInputFormat - Not enough mappers

Reply via email to