Re: Hadoop - how exactly is a slot defined

Grandl Robert Wed, 24 Nov 2010 08:53:46 -0800

Hi,
I am sorry bothering again about this subject, but still I am not very 
convinced what Hadoop assumes a slot is. I understood it represent smth in 
terms of CPU/Memory, so you have to allocate corresponding numbers of 
map/reduce slots based on your configurations.
BUT, I cannot understand yet, if Hadoop make any mapping between the concept of 
slot and physical resources itself, or are just some numbers and you can go 
over only with this numbers. 
I looked on the code, but I am not able to figure out if Hadoop really did some 
checking between number of slots and physical resources, or just is limited by 
the 2 numbers(for maximum number of map slots and reduce slots) and play with 
this numbers only. That means, the user should give more interpretation of what 
a slot really may be: (Only one slot per core, one slot per 512 MB, etc) when 
configure the number of map/reduce slots on his machines.
Thanks in advance for any clue.
Cheers,Robert

--- On Mon, 11/22/10, Harsh J <[email protected]> wrote:

From: Harsh J <[email protected]>
Subject: Re: Hadoop - how exactly is a slot defined
To: [email protected]
Date: Monday, November 22, 2010, 6:52 PM

Hi,

On Mon, Nov 22, 2010 at 10:02 PM, Grandl Robert <[email protected]> wrote:
> Hi all,
>
> I have troubles in understanding what exactly a slot is. Always we are 
> talking about tasks assigned to slots, but I did not found anywhere what 
> exactly a slot is. I assume it represent some allocation of RAM memory as 
> well as with some computation power.
>
> However, can somebody explain me what exactly a slot means (in terms of 
> resources allocated for a slot) and how this mapping(between slot and 
> physical resources) is done in Hadoop ? Or give me some hints about the files 
> in the Hadoop  where it may should be ?

A slot is of two types -- Map slot and Reduce slot. A slot represents
an ability to run one of these "Tasks" (map/reduce tasks) individually
at a point of time. Therefore, multiple slots on a TaskTracker means
multiple "Tasks" may execute in parallel.

Right now total slots in a TaskTracker is ==
mapred.tasktracker.map.tasks.maximum for Maps and
mapred.tasktracker.reduce.tasks.maximum for Reduces.

Hadoop is indeed trying to go towards the dynamic slot concept, which
could rely on the current resources available on a system, but work
for this is still in conceptual phases. TaskTrackers emit system
status (like CPU load, utilization, memory available/user, load
averages) in their heartbeats today (and is utilized by certain
schedulers, I think Capacity Scheduler uses it to determine stuff),
but the concept of slots is still fixed as a maximum to the above two
configurations on each TaskTracker.

For code on how slots are checked/utilized, see any Scheduler plugin's
code -- LimitTasksPerJobTaskScheduler, CapacityTaskScheduler for
example.

>
> Thanks a lot,
> Robert
>
>
>

-- 
Harsh J
www.harshj.com

Re: Hadoop - how exactly is a slot defined

Reply via email to