Thanks to you all for the explanations.
So, as far as I understand, if I configure 4 map slots per node(let's say - 512 
MB RAM per slot as my node has 2 GB in total) the hadoop will always try to 
allocate 4 slots ?  Does the node report on the hearbteat that it has 4 free 
slots ? 
But then, my question comes: what if another workload contend with hadoop 
workload at a moment, that means few resources available now for hadoop. Did 
hadoop still report he has 4 slots free and implicitly try to allocate tasks 
for these 4 slots ?
Thank you again for your promptly answers.
Cheers,Robert
--- On Wed, 11/24/10, Jonathan Creasy <[email protected]> wrote:

From: Jonathan Creasy <[email protected]>
Subject: Re: Hadoop - how exactly is a slot defined
To: "[email protected]" <[email protected]>
Date: Wednesday, November 24, 2010, 7:04 PM

Robert, 

Hadoop is not currently doing any dynamic detection of resources to determine 
the number of slots. If I told Hadoop it could run 3,587 map tasks, it might 
well try to do it. 

We use standards to determine how many map and reduce tasks a node is allowed:

Each Map/Reduce Task is given:
2GB of Ram
1 Core
50GB of tmp disk space

The formula for map/reduce slots looks something like this in our environment:

G = GB of Ram
D = Disk space in /tmp
C = count of CPU cores

The minimum of: 
(G-2)/2
D/50
C-1

These numbers aren't published anywhere and may completely fly in the face of 
conventional wisdom but it's what we are using and so far, seems to work for 
us. 

-Jonathan


On Nov 24, 2010, at 10:53 AM, Grandl Robert wrote:

> Hi,
> I am sorry bothering again about this subject, but still I am not very 
> convinced what Hadoop assumes a slot is. I understood it represent smth in 
> terms of CPU/Memory, so you have to allocate corresponding numbers of 
> map/reduce slots based on your configurations.
> BUT, I cannot understand yet, if Hadoop make any mapping between the concept 
> of slot and physical resources itself, or are just some numbers and you can 
> go over only with this numbers. 
> I looked on the code, but I am not able to figure out if Hadoop really did 
> some checking between number of slots and physical resources, or just is 
> limited by the 2 numbers(for maximum number of map slots and reduce slots) 
> and play with this numbers only. That means, the user should give more 
> interpretation of what a slot really may be: (Only one slot per core, one 
> slot per 512 MB, etc) when configure the number of map/reduce slots on his 
> machines.
> Thanks in advance for any clue.
> Cheers,Robert
> 
> --- On Mon, 11/22/10, Harsh J <[email protected]> wrote:
> 
> From: Harsh J <[email protected]>
> Subject: Re: Hadoop - how exactly is a slot defined
> To: [email protected]
> Date: Monday, November 22, 2010, 6:52 PM
> 
> Hi,
> 
> On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <[email protected]> wrote:
>> Hi all,
>> 
>> I have troubles in understanding what exactly a slot is. Always we are 
>> talking about tasks assigned to slots, but I did not found anywhere what 
>> exactly a slot is. I assume it represent some allocation of RAM memory as 
>> well as with some computation power.
>> 
>> However, can somebody explain me what exactly a slot means (in terms of 
>> resources allocated for a slot) and how this mapping(between slot and 
>> physical resources) is done in Hadoop ? Or give me some hints about the 
>> files in the Hadoop  where it may should be ?
> 
> A slot is of two types -- Map slot and Reduce slot. A slot represents
> an ability to run one of these "Tasks" (map/reduce tasks) individually
> at a point of time. Therefore, multiple slots on a TaskTracker means
> multiple "Tasks" may execute in parallel.
> 
> Right now total slots in a TaskTracker is ==
> mapred.tasktracker.map.tasks.maximum for Maps and
> mapred.tasktracker.reduce.tasks.maximum for Reduces.
> 
> Hadoop is indeed trying to go towards the dynamic slot concept, which
> could rely on the current resources available on a system, but work
> for this is still in conceptual phases. TaskTrackers emit system
> status (like CPU load, utilization, memory available/user, load
> averages) in their heartbeats today (and is utilized by certain
> schedulers, I think Capacity Scheduler uses it to determine stuff),
> but the concept of slots is still fixed as a maximum to the above two
> configurations on each TaskTracker.
> 
> For code on how slots are checked/utilized, see any Scheduler plugin's
> code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for
> example.
> 
>> 
>> Thanks a lot,
>> Robert
>> 
>> 
>> 
> 
> 
> 
> -- 
> Harsh J
> www.harshj.com
> 
> 
> 




      

Reply via email to