subject:"Why is there a seperate map and reduce task capacity\?"

Re: Why is there a seperate map and reduce task capacity?

2008-06-16 Thread Amareshwari Sriramadasu


Taeho Kang wrote:

Set "mapred.tasktracker.tasks.maximum"
and each node will be able to process N number of tasks - map or/and reduce.

Please note that once you set "mapred.tasktracker.tasks.maximum",
"mapred.tasktracker.map.tasks.maximum" and
"mapred.tasktracker.reduce.tasks.maximum" setting will not take effect.



  
This is valid only till 0.16.*, because the property 
"mapred.tasktracker.tasks.maximum" is removed from 0.17.
So, from 0.17, "mapred.tasktracker.map.tasks.maximum" and 
"mapred.tasktracker.reduce.tasks.maximum" should be used.

On Tue, Jun 17, 2008 at 1:46 PM, Amar Kamat <[EMAIL PROTECTED]> wrote:

  

Daniel Leffel wrote:



Why not just combine them? How do I do that?



  

Consider a case where the cluster (of n nodes) is configured to process
just one task per node. Let there be (n-1) reducers. Lets assume that the
map phase is complete and the reducers are shuffling. There will be (n-1)
nodes with reducers. Now consider a case where the only node without the
reducer gets lost. The cluster needs slots to run maps that were lost since
the reducers are waiting for the maps to finish. In such a case the job will
get stuck. To avoid such cases, there are separate maps and reduce task
slots.
Amar

 Rationale is that our tasks are very balanced in load, but unbalanced


in timing. I've found that limiting the number of total threads to be
the most safe approach to not overloading the dfs daemon. To date,
I've done that just through intelligent scheduling of jobs to stagger
maps and reduces, but have I missed a setting that exists to simply
limit number of tasks in-total?

Re: Why is there a seperate map and reduce task capacity?

2008-06-16 Thread Taeho Kang

Set "mapred.tasktracker.tasks.maximum"
and each node will be able to process N number of tasks - map or/and reduce.

Please note that once you set "mapred.tasktracker.tasks.maximum",
"mapred.tasktracker.map.tasks.maximum" and
"mapred.tasktracker.reduce.tasks.maximum" setting will not take effect.




On Tue, Jun 17, 2008 at 1:46 PM, Amar Kamat <[EMAIL PROTECTED]> wrote:

> Daniel Leffel wrote:
>
>> Why not just combine them? How do I do that?
>>
>>
>>
> Consider a case where the cluster (of n nodes) is configured to process
> just one task per node. Let there be (n-1) reducers. Lets assume that the
> map phase is complete and the reducers are shuffling. There will be (n-1)
> nodes with reducers. Now consider a case where the only node without the
> reducer gets lost. The cluster needs slots to run maps that were lost since
> the reducers are waiting for the maps to finish. In such a case the job will
> get stuck. To avoid such cases, there are separate maps and reduce task
> slots.
> Amar
>
>  Rationale is that our tasks are very balanced in load, but unbalanced
>> in timing. I've found that limiting the number of total threads to be
>> the most safe approach to not overloading the dfs daemon. To date,
>> I've done that just through intelligent scheduling of jobs to stagger
>> maps and reduces, but have I missed a setting that exists to simply
>> limit number of tasks in-total?
>>
>>
>
>

Re: Why is there a seperate map and reduce task capacity?

2008-06-16 Thread Amar Kamat


Daniel Leffel wrote:

Why not just combine them? How do I do that?

  
Consider a case where the cluster (of n nodes) is configured to process 
just one task per node. Let there be (n-1) reducers. Lets assume that 
the map phase is complete and the reducers are shuffling. There will be 
(n-1) nodes with reducers. Now consider a case where the only node 
without the reducer gets lost. The cluster needs slots to run maps that 
were lost since the reducers are waiting for the maps to finish. In such 
a case the job will get stuck. To avoid such cases, there are separate 
maps and reduce task slots.

Amar

Rationale is that our tasks are very balanced in load, but unbalanced
in timing. I've found that limiting the number of total threads to be
the most safe approach to not overloading the dfs daemon. To date,
I've done that just through intelligent scheduling of jobs to stagger
maps and reduces, but have I missed a setting that exists to simply
limit number of tasks in-total?

Why is there a seperate map and reduce task capacity?

2008-06-16 Thread Daniel Leffel

Why not just combine them? How do I do that?

Rationale is that our tasks are very balanced in load, but unbalanced
in timing. I've found that limiting the number of total threads to be
the most safe approach to not overloading the dfs daemon. To date,
I've done that just through intelligent scheduling of jobs to stagger
maps and reduces, but have I missed a setting that exists to simply
limit number of tasks in-total?

Re: Why is there a seperate map and reduce task capacity?

Re: Why is there a seperate map and reduce task capacity?

Re: Why is there a seperate map and reduce task capacity?

Why is there a seperate map and reduce task capacity?

4 matches

Site Navigation

Mail list logo

Footer information