Re: Resource underutilization / final reduce tasks only uses half of cluster ( tasktracker map/reduce slots )

2012-05-14 Thread Abhishek Pratap Singh
Hi JD,

Number of reduce task will depend upon the key after all the mapper is
done. if the key is same than all the data will go to one node, similarly
utilization of all nodes of cluster will depend upon the number of
different keys for reduce task.


Regards,
Abhishek

On Fri, May 11, 2012 at 4:57 PM, Jeremy Davis
jda...@upstreamsoftware.comwrote:


 I see mapred.tasktracker.reduce.tasks.maximum and
 mapred.tasktracker.map.tasks.maximum, but I'm wondering if there isn't
 another tuning parameter I need to look at.

 I can tune the task tracker so that when I have many jobs running, with
 many simultaneous maps and reduces I utilize 95% of cpu and memory.

 Inevitably though I end up with a huge final reduce task that only uses
 half of of my cluster because I have reserved the other half for Mapping.

 Is there a way around this problem?

 Seems like there should also be a maximum number of reducers conditional
 on no Map tasks running.

 -JD


Resource underutilization / final reduce tasks only uses half of cluster ( tasktracker map/reduce slots )

2012-05-11 Thread Jeremy Davis

I see mapred.tasktracker.reduce.tasks.maximum and 
mapred.tasktracker.map.tasks.maximum, but I'm wondering if there isn't another 
tuning parameter I need to look at.

I can tune the task tracker so that when I have many jobs running, with many 
simultaneous maps and reduces I utilize 95% of cpu and memory. 

Inevitably though I end up with a huge final reduce task that only uses half of 
of my cluster because I have reserved the other half for Mapping. 

Is there a way around this problem?  

Seems like there should also be a maximum number of reducers conditional on no 
Map tasks running. 

-JD