Robert, Hadoop is not currently doing any dynamic detection of resources to determine the number of slots. If I told Hadoop it could run 3,587 map tasks, it might well try to do it.
We use standards to determine how many map and reduce tasks a node is allowed: Each Map/Reduce Task is given: 2GB of Ram 1 Core 50GB of tmp disk space The formula for map/reduce slots looks something like this in our environment: G = GB of Ram D = Disk space in /tmp C = count of CPU cores The minimum of: (G-2)/2 D/50 C-1 These numbers aren't published anywhere and may completely fly in the face of conventional wisdom but it's what we are using and so far, seems to work for us. -Jonathan On Nov 24, 2010, at 10:53 AM, Grandl Robert wrote: > Hi, > I am sorry bothering again about this subject, but still I am not very > convinced what Hadoop assumes a slot is. I understood it represent smth in > terms of CPU/Memory, so you have to allocate corresponding numbers of > map/reduce slots based on your configurations. > BUT, I cannot understand yet, if Hadoop make any mapping between the concept > of slot and physical resources itself, or are just some numbers and you can > go over only with this numbers. > I looked on the code, but I am not able to figure out if Hadoop really did > some checking between number of slots and physical resources, or just is > limited by the 2 numbers(for maximum number of map slots and reduce slots) > and play with this numbers only. That means, the user should give more > interpretation of what a slot really may be: (Only one slot per core, one > slot per 512 MB, etc) when configure the number of map/reduce slots on his > machines. > Thanks in advance for any clue. > Cheers,Robert > > --- On Mon, 11/22/10, Harsh J <[email protected]> wrote: > > From: Harsh J <[email protected]> > Subject: Re: Hadoop - how exactly is a slot defined > To: [email protected] > Date: Monday, November 22, 2010, 6:52 PM > > Hi, > > On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <[email protected]> wrote: >> Hi all, >> >> I have troubles in understanding what exactly a slot is. Always we are >> talking about tasks assigned to slots, but I did not found anywhere what >> exactly a slot is. I assume it represent some allocation of RAM memory as >> well as with some computation power. >> >> However, can somebody explain me what exactly a slot means (in terms of >> resources allocated for a slot) and how this mapping(between slot and >> physical resources) is done in Hadoop ? Or give me some hints about the >> files in the Hadoop where it may should be ? > > A slot is of two types -- Map slot and Reduce slot. A slot represents > an ability to run one of these "Tasks" (map/reduce tasks) individually > at a point of time. Therefore, multiple slots on a TaskTracker means > multiple "Tasks" may execute in parallel. > > Right now total slots in a TaskTracker is == > mapred.tasktracker.map.tasks.maximum for Maps and > mapred.tasktracker.reduce.tasks.maximum for Reduces. > > Hadoop is indeed trying to go towards the dynamic slot concept, which > could rely on the current resources available on a system, but work > for this is still in conceptual phases. TaskTrackers emit system > status (like CPU load, utilization, memory available/user, load > averages) in their heartbeats today (and is utilized by certain > schedulers, I think Capacity Scheduler uses it to determine stuff), > but the concept of slots is still fixed as a maximum to the above two > configurations on each TaskTracker. > > For code on how slots are checked/utilized, see any Scheduler plugin's > code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for > example. > >> >> Thanks a lot, >> Robert >> >> >> > > > > -- > Harsh J > www.harshj.com > > >
