Hi, I am sorry bothering again about this subject, but still I am not very convinced what Hadoop assumes a slot is. I understood it represent smth in terms of CPU/Memory, so you have to allocate corresponding numbers of map/reduce slots based on your configurations. BUT, I cannot understand yet, if Hadoop make any mapping between the concept of slot and physical resources itself, or are just some numbers and you can go over only with this numbers. I looked on the code, but I am not able to figure out if Hadoop really did some checking between number of slots and physical resources, or just is limited by the 2 numbers(for maximum number of map slots and reduce slots) and play with this numbers only. That means, the user should give more interpretation of what a slot really may be: (Only one slot per core, one slot per 512 MB, etc) when configure the number of map/reduce slots on his machines. Thanks in advance for any clue. Cheers,Robert
--- On Mon, 11/22/10, Harsh J <[email protected]> wrote: From: Harsh J <[email protected]> Subject: Re: Hadoop - how exactly is a slot defined To: [email protected] Date: Monday, November 22, 2010, 6:52 PM Hi, On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <[email protected]> wrote: > Hi all, > > I have troubles in understanding what exactly a slot is. Always we are > talking about tasks assigned to slots, but I did not found anywhere what > exactly a slot is. I assume it represent some allocation of RAM memory as > well as with some computation power. > > However, can somebody explain me what exactly a slot means (in terms of > resources allocated for a slot) and how this mapping(between slot and > physical resources) is done in Hadoop ? Or give me some hints about the files > in the Hadoop where it may should be ? A slot is of two types -- Map slot and Reduce slot. A slot represents an ability to run one of these "Tasks" (map/reduce tasks) individually at a point of time. Therefore, multiple slots on a TaskTracker means multiple "Tasks" may execute in parallel. Right now total slots in a TaskTracker is == mapred.tasktracker.map.tasks.maximum for Maps and mapred.tasktracker.reduce.tasks.maximum for Reduces. Hadoop is indeed trying to go towards the dynamic slot concept, which could rely on the current resources available on a system, but work for this is still in conceptual phases. TaskTrackers emit system status (like CPU load, utilization, memory available/user, load averages) in their heartbeats today (and is utilized by certain schedulers, I think Capacity Scheduler uses it to determine stuff), but the concept of slots is still fixed as a maximum to the above two configurations on each TaskTracker. For code on how slots are checked/utilized, see any Scheduler plugin's code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for example. > > Thanks a lot, > Robert > > > -- Harsh J www.harshj.com
