RE: How can I control Number of Mappers of a job?

Goel, Ankur Mon, 04 Aug 2008 00:37:56 -0700

This can be done very easily setting the number of mappers you want -
jobConf.setNumMapTasks() and use input format -
MultiFileWordCount.MyInputFormat.class which is a concrete
implementation of MultiFileInputFormat.


-----Original Message-----
From: Jason Venner [mailto:[EMAIL PROTECTED] 
Sent: Saturday, August 02, 2008 5:41 AM
To: core-user@hadoop.apache.org
Subject: Re: How can I control Number of Mappers of a job?

We control the number of map tasks by carefully managing the input split

size when we need to.
This may require using the multiplefileinput classes or aggregating your

input files before hand.
You need to have some aggregation either by contactination or the 
MultipleFileInput if you have more input files than you want map tasks.



The case of 1 mapper per input file requires setting the inputsplitsize 
to Long.MAX_SIZE (see the datajoin classes for examples)



paul wrote:
> I've talked to a few people that claim to have done this as a way to
limit
> resources for different groups, like developers versus production
jobs.
> Haven't tried it myself yet, but it's getting close to the top of my
to-do
> list.
>
>
> -paul
>
>
> On Fri, Aug 1, 2008 at 1:36 PM, James Moore <[EMAIL PROTECTED]>
wrote:
>
>   
>> On Thu, Jul 31, 2008 at 12:30 PM, Gopal Gandhi
>> <[EMAIL PROTECTED]> wrote:
>>     
>>> Thank you, finally someone has interests in my questions =)
>>> My cluster contains more than one machine. Please don't get me wrong
:-).
>>>       
>> I don't want to limit the total mappers in one node (by
mapred.map.tasks).
>> What I want is to limit the total mappers for one job. The motivation
is
>> that I have 2 jobs to run at the same time. they have "the same input
data
>> in Hadoop". I found that one job has to wait until the other finishes
its
>> mapping. Because the 2 jobs are submitted by 2 different people, I
don't
>> want one job to be starving. So I want to limit the first job's total
>> mappers so that the 2 jobs will be launched simultaneously.
>>
>> What about running two different jobtrackers on the same machines,
>> looking at the same DFS files?  Never tried it myself, but it might
be
>> an approach.
>>
>> --
>> James Moore | [EMAIL PROTECTED]
>> Ruby and Ruby on Rails consulting
>> blog.restphone.com
>>
>>     
>
>   

-- 
Jason Venner
Attributor - Program the Web <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers and coding wizards, contact if 
interested

RE: How can I control Number of Mappers of a job?

Reply via email to