RE: MultiFileInputFormat - Not enough mappers

Goel, Ankur Fri, 11 Jul 2008 06:57:39 -0700

In this case I have to compute the number of map tasks in the
application - (totalSize / blockSize), which is what I am doing as a
work-around.
I think this should be the default behaviour in MultiFileInputFormat.
Should a JIRA be opened for the same ?


-Ankur


-----Original Message-----
From: Enis Soztutar [mailto:[EMAIL PROTECTED] 
Sent: Friday, July 11, 2008 7:21 PM
To: core-user@hadoop.apache.org
Subject: Re: MultiFileInputFormat - Not enough mappers

MultiFileSplit currently does not support automatic map task count
computation. You can manually set the number of maps via
jobConf#setNumMapTasks() or via command line arg -D
mapred.map.tasks=<number>


Goel, Ankur wrote:
> Hi Folks,
>               I am using hadoop to process some temporal data which is

> split in lot of small files (~ 3 - 4 MB) Using TextInputFormat 
> resulted in too many mappers (1 per file) creating a lot of overhead 
> so I switched to MultiFileInputFormat - 
> (MutiFileWordCount.MyInputFormat) which resulted in just 1 mapper.
>  
> I was hoping to set the no of mappers to 1 so that hadoop 
> automatically takes care of generating the right number of map tasks.
>  
> Looks like when using MultiFileInputFormat one has to rely on the 
> application to specify the right number of mappers or am I missing 
> something ? Please advise.
>  
> Thanks
> -Ankur
>
>

RE: MultiFileInputFormat - Not enough mappers

Reply via email to