We control the number of map tasks by carefully managing the input split size when we need to. This may require using the multiplefileinput classes or aggregating your input files before hand. You need to have some aggregation either by contactination or the MultipleFileInput if you have more input files than you want map tasks.

The case of 1 mapper per input file requires setting the inputsplitsize to Long.MAX_SIZE (see the datajoin classes for examples)



paul wrote:
I've talked to a few people that claim to have done this as a way to limit
resources for different groups, like developers versus production jobs.
Haven't tried it myself yet, but it's getting close to the top of my to-do
list.


-paul


On Fri, Aug 1, 2008 at 1:36 PM, James Moore <[EMAIL PROTECTED]> wrote:

On Thu, Jul 31, 2008 at 12:30 PM, Gopal Gandhi
<[EMAIL PROTECTED]> wrote:
Thank you, finally someone has interests in my questions =)
My cluster contains more than one machine. Please don't get me wrong :-).
I don't want to limit the total mappers in one node (by mapred.map.tasks).
What I want is to limit the total mappers for one job. The motivation is
that I have 2 jobs to run at the same time. they have "the same input data
in Hadoop". I found that one job has to wait until the other finishes its
mapping. Because the 2 jobs are submitted by 2 different people, I don't
want one job to be starving. So I want to limit the first job's total
mappers so that the 2 jobs will be launched simultaneously.

What about running two different jobtrackers on the same machines,
looking at the same DFS files?  Never tried it myself, but it might be
an approach.

--
James Moore | [EMAIL PROTECTED]
Ruby and Ruby on Rails consulting
blog.restphone.com



--
Jason Venner
Attributor - Program the Web <http://www.attributor.com/>
Attributor is hiring Hadoop Wranglers and coding wizards, contact if interested

Reply via email to