Thanks for the replies. The NLineInputFormat uses JobConf which has been
deprecated so I would rather not use that class. But I looked at the
FileInputFormat which has the following method:
FileInputFormat.setMinInputSplitSize(job, 100);
I thought if I set InputSplitSize to 100, for every 100 lines in the input
file a Mapper would be triggered. My input file has 500 lines, so I was
expecting to see 5 Mappers, but only one Mapper is triggered.
Please help. Thanks.
On Sun, Jan 17, 2010 at 11:45 PM, Amareshwari Sri Ramadasu <
[email protected]> wrote:
>
> Changing the audience to mapreduce-user.
>
> Setting the number of map tasks (mapred.map.tasks or
> JobConf.setNumMapTasks()) does not guarantee that number of maps in the job
> will be set to that. It will only be used as a hint. Number of maps is
> decided by your InputFormat. You should implement InputFormat.getSplits() to
> define how the input should be split. The fact is "number of splits is equal
> to the number of maps".
> If you are using default InputFormat (i.e. TextInputFormat), number of maps
> is decided by DFS block size. If you use NLineInputFormat with
> mapred.line.input.format.linespermap=1, number of maps will be number of
> lines in the file.
> More details @
>
> http://hadoop.apache.org/common/docs/r0.20.0/api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks%28int%29
>
> Thanks
> Amareshwari
> On 1/18/10 12:51 PM, "Something Something" <[email protected]>
> wrote:
>
> Hello,
>
> I read the documentation about running multiple Mapper tasks, but I can't
> get multiple Mappers to work. I am running under EC2 with 10 nodes.
>
> Here's what I know:
>
> 1) I guess, by default, No. of Mapper tasks will be decided by DFS block
> size, but I would like to override that. My file is small, but each line
> triggers fairly long running complicated calculations that should be run in
> parallel.
>
> 2) I tried setting the following property in the mapred-site.xml (only on
> Master), but that doesn't seem to help:
>
> <property>
> <name>mapred.map.tasks</name>
> <value>10</value>
> </property>
>
> I still see the following message:
>
> 10/01/18 01:56:34 INFO mapred.JobClient: Launched map tasks=1
> 10/01/18 01:56:34 INFO mapred.JobClient: Data-local map tasks=1
>
> (Also, I know for fact that multiple mappers are not running!)
>
>
> 3) I read somewhere that JobConf has a method called setNumMapTasks, but
> this class has been deprecated, and as such I am not using. Besides this
> method just provides a hint to Hadoop, I heard.
>
> So how do I trigger multiple Mapper tasks? Please let me know. Thanks.
>
>