Yes, please open a jira for this. We should ensure that avgLengthPerSplit in MultiFileInputFormat should not exceed default file block size. However unlike FileInputFormat, all the files will come from a different block.

Goel, Ankur wrote:
In this case I have to compute the number of map tasks in the
application - (totalSize / blockSize), which is what I am doing as a
work-around.
I think this should be the default behaviour in MultiFileInputFormat.
Should a JIRA be opened for the same ?

-Ankur


-----Original Message-----
From: Enis Soztutar [mailto:[EMAIL PROTECTED] Sent: Friday, July 11, 2008 7:21 PM
To: core-user@hadoop.apache.org
Subject: Re: MultiFileInputFormat - Not enough mappers

MultiFileSplit currently does not support automatic map task count
computation. You can manually set the number of maps via
jobConf#setNumMapTasks() or via command line arg -D
mapred.map.tasks=<number>


Goel, Ankur wrote:
Hi Folks,
              I am using hadoop to process some temporal data which is

split in lot of small files (~ 3 - 4 MB) Using TextInputFormat resulted in too many mappers (1 per file) creating a lot of overhead so I switched to MultiFileInputFormat - (MutiFileWordCount.MyInputFormat) which resulted in just 1 mapper. I was hoping to set the no of mappers to 1 so that hadoop automatically takes care of generating the right number of map tasks. Looks like when using MultiFileInputFormat one has to rely on the application to specify the right number of mappers or am I missing something ? Please advise. Thanks
-Ankur


Reply via email to