Thanks all for your comments. However, I still have some doubts.
Basically I can control the number of map/reduce slots with mapred.tasktracker.map.tasks.maximum mapred.tasktracker.reduce.tasks.maximum but, it is possible to set different number of map/reduce slots for different slaves ? For example If I am running in a heterogeneous environment, where each slave have different configuration, it is possible to set different number of slots based on the specific machine configurations ? For the moment I observed that I can modify only on the master this parameters, therefore all the nodes will run with same number of map/reduce slots careless of whatever resources(CPU,MEMORY) offer each other. Thanks for any clue. Robert --- On Mon, 11/22/10, Harsh J <[email protected]> wrote: From: Harsh J <[email protected]> Subject: Re: Hadoop - how exactly is a slot defined To: [email protected] Date: Monday, November 22, 2010, 6:52 PM Hi, On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <[email protected]> wrote: > Hi all, > > I have troubles in understanding what exactly a slot is. Always we are > talking about tasks assigned to slots, but I did not found anywhere what > exactly a slot is. I assume it represent some allocation of RAM memory as > well as with some computation power. > > However, can somebody explain me what exactly a slot means (in terms of > resources allocated for a slot) and how this mapping(between slot and > physical resources) is done in Hadoop ? Or give me some hints about the files > in the Hadoop where it may should be ? A slot is of two types -- Map slot and Reduce slot. A slot represents an ability to run one of these "Tasks" (map/reduce tasks) individually at a point of time. Therefore, multiple slots on a TaskTracker means multiple "Tasks" may execute in parallel. Right now total slots in a TaskTracker is == mapred.tasktracker.map.tasks.maximum for Maps and mapred.tasktracker.reduce.tasks.maximum for Reduces. Hadoop is indeed trying to go towards the dynamic slot concept, which could rely on the current resources available on a system, but work for this is still in conceptual phases. TaskTrackers emit system status (like CPU load, utilization, memory available/user, load averages) in their heartbeats today (and is utilized by certain schedulers, I think Capacity Scheduler uses it to determine stuff), but the concept of slots is still fixed as a maximum to the above two configurations on each TaskTracker. For code on how slots are checked/utilized, see any Scheduler plugin's code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for example. > > Thanks a lot, > Robert > > > -- Harsh J www.harshj.com
