You said you have a large amount of data.
How large is that approximately?
Did you compress the intermediate data (with what codec)?

Niels

2011/1/7 Tali K <ncherr...@hotmail.com>:
>
> According to the documentation, that parameter is for the number of
>    tasks *per TaskTracker*.  I am asking about the number of tasks
>    for the entire job and entire cluster.  That parameter is already
>    set to 3, which is one less than the number of cores on each node's
>    CPU, as recommended.In my question I stated   that
>    82 tasks were run for the first job, yet only 4 for the second -
>    both numbers being cluster-wide.
>
>
>
>> Date: Fri, 7 Jan 2011 13:19:42 -0800
>> Subject: Re: Help: How to increase amont maptasks per job ?
>> From: yuzhih...@gmail.com
>> To: common-user@hadoop.apache.org
>>
>> Set higher values for mapred.tasktracker.map.tasks.maximum (and
>> mapred.tasktracker.reduce.tasks.maximum) in mapred-site.xml
>>
>> On Fri, Jan 7, 2011 at 12:58 PM, Tali K <ncherr...@hotmail.com> wrote:
>>
>> >
>> >
>> >
>> >
>> > We have a jobs which runs in several map/reduce stages.  In the first job,
>> > a large number of map tasks -82  are initiated, as expected.
>> > And that cause all nodes to be used.
>> >  In a
>> > later job, where we are still dealing with large amounts of
>> >  data, only 4 map tasks are initiated, and that caused to use only 4 nodes.
>> > This stage is actually the
>> > workhorse of the job, and requires much more processing power than the
>> > initial stage.
>> >  We are trying to understand why only a few map tasks are
>> > being used, as we are not getting the full advantage of our cluster.
>> >
>> >
>> >
>> >
>



-- 
Met vriendelijke groeten,

Niels Basjes

Reply via email to