Thanks to you all for the explanations. So, as far as I understand, if I configure 4 map slots per node(let's say - 512 MB RAM per slot as my node has 2 GB in total) the hadoop will always try to allocate 4 slots ? Does the node report on the hearbteat that it has 4 free slots ? But then, my question comes: what if another workload contend with hadoop workload at a moment, that means few resources available now for hadoop. Did hadoop still report he has 4 slots free and implicitly try to allocate tasks for these 4 slots ? Thank you again for your promptly answers. Cheers,Robert --- On Wed, 11/24/10, Jonathan Creasy <[email protected]> wrote:
From: Jonathan Creasy <[email protected]> Subject: Re: Hadoop - how exactly is a slot defined To: "[email protected]" <[email protected]> Date: Wednesday, November 24, 2010, 7:04 PM Robert, Hadoop is not currently doing any dynamic detection of resources to determine the number of slots. If I told Hadoop it could run 3,587 map tasks, it might well try to do it. We use standards to determine how many map and reduce tasks a node is allowed: Each Map/Reduce Task is given: 2GB of Ram 1 Core 50GB of tmp disk space The formula for map/reduce slots looks something like this in our environment: G = GB of Ram D = Disk space in /tmp C = count of CPU cores The minimum of: (G-2)/2 D/50 C-1 These numbers aren't published anywhere and may completely fly in the face of conventional wisdom but it's what we are using and so far, seems to work for us. -Jonathan On Nov 24, 2010, at 10:53 AM, Grandl Robert wrote: > Hi, > I am sorry bothering again about this subject, but still I am not very > convinced what Hadoop assumes a slot is. I understood it represent smth in > terms of CPU/Memory, so you have to allocate corresponding numbers of > map/reduce slots based on your configurations. > BUT, I cannot understand yet, if Hadoop make any mapping between the concept > of slot and physical resources itself, or are just some numbers and you can > go over only with this numbers. > I looked on the code, but I am not able to figure out if Hadoop really did > some checking between number of slots and physical resources, or just is > limited by the 2 numbers(for maximum number of map slots and reduce slots) > and play with this numbers only. That means, the user should give more > interpretation of what a slot really may be: (Only one slot per core, one > slot per 512 MB, etc) when configure the number of map/reduce slots on his > machines. > Thanks in advance for any clue. > Cheers,Robert > > --- On Mon, 11/22/10, Harsh J <[email protected]> wrote: > > From: Harsh J <[email protected]> > Subject: Re: Hadoop - how exactly is a slot defined > To: [email protected] > Date: Monday, November 22, 2010, 6:52 PM > > Hi, > > On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <[email protected]> wrote: >> Hi all, >> >> I have troubles in understanding what exactly a slot is. Always we are >> talking about tasks assigned to slots, but I did not found anywhere what >> exactly a slot is. I assume it represent some allocation of RAM memory as >> well as with some computation power. >> >> However, can somebody explain me what exactly a slot means (in terms of >> resources allocated for a slot) and how this mapping(between slot and >> physical resources) is done in Hadoop ? Or give me some hints about the >> files in the Hadoop where it may should be ? > > A slot is of two types -- Map slot and Reduce slot. A slot represents > an ability to run one of these "Tasks" (map/reduce tasks) individually > at a point of time. Therefore, multiple slots on a TaskTracker means > multiple "Tasks" may execute in parallel. > > Right now total slots in a TaskTracker is == > mapred.tasktracker.map.tasks.maximum for Maps and > mapred.tasktracker.reduce.tasks.maximum for Reduces. > > Hadoop is indeed trying to go towards the dynamic slot concept, which > could rely on the current resources available on a system, but work > for this is still in conceptual phases. TaskTrackers emit system > status (like CPU load, utilization, memory available/user, load > averages) in their heartbeats today (and is utilized by certain > schedulers, I think Capacity Scheduler uses it to determine stuff), > but the concept of slots is still fixed as a maximum to the above two > configurations on each TaskTracker. > > For code on how slots are checked/utilized, see any Scheduler plugin's > code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for > example. > >> >> Thanks a lot, >> Robert >> >> >> > > > > -- > Harsh J > www.harshj.com > > >
