In this case I have to compute the number of map tasks in the application - (totalSize / blockSize), which is what I am doing as a work-around. I think this should be the default behaviour in MultiFileInputFormat. Should a JIRA be opened for the same ?
-Ankur -----Original Message----- From: Enis Soztutar [mailto:[EMAIL PROTECTED] Sent: Friday, July 11, 2008 7:21 PM To: core-user@hadoop.apache.org Subject: Re: MultiFileInputFormat - Not enough mappers MultiFileSplit currently does not support automatic map task count computation. You can manually set the number of maps via jobConf#setNumMapTasks() or via command line arg -D mapred.map.tasks=<number> Goel, Ankur wrote: > Hi Folks, > I am using hadoop to process some temporal data which is > split in lot of small files (~ 3 - 4 MB) Using TextInputFormat > resulted in too many mappers (1 per file) creating a lot of overhead > so I switched to MultiFileInputFormat - > (MutiFileWordCount.MyInputFormat) which resulted in just 1 mapper. > > I was hoping to set the no of mappers to 1 so that hadoop > automatically takes care of generating the right number of map tasks. > > Looks like when using MultiFileInputFormat one has to rely on the > application to specify the right number of mappers or am I missing > something ? Please advise. > > Thanks > -Ankur > >