Thanks for your reply. Some additional questions:
[1] How does the application master determine the size (memory requirement) of 
the container  ? Can the container viewed as a JVM with CPU, memory?
[2] The document, mentions a concept of fungibility of resources across 
servers. An allocated container of 2 GB of RAM for a reducer could be across 
two servers of 1GB each.  If so a task is split across 2 servers? Not sure how 
that works.
[3] The application master corresponds to Job Tracker for a given job, and Node 
Manager corresponds to task tracker  in  pre 0.23 hadoop. Is this assumption 
correct?
[4] For data to be transferred from map->reduce node, is it the reduce node 
"node manager" who periodically polls the application master, and subsequently 
pulls map data from the completed map nodes?

Thanks!

From: Robert Evans <ev...@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org>; Ann 
Pal <ann_r_...@yahoo.com> 
Sent: Wednesday, January 4, 2012 9:54 AM
Subject: Re: Yarn related questions:
 

Re: Yarn related questions: 
Ann,

A container more or less corresponds to a task in MRV1.  There is one exception 
to this, as the ApplicationMaster also runs in a container.  The 
ApplicationMaster will request new containers for each mapper or reducer task 
that it wants to launch.  There is separate code from the container that will 
serve up the intermediate mapper output and is run as part of the NodeManager 
(Similar to the TaskTracker from before).  When the ApplicationMaster requests 
a container it also includes with it a hint as to where it would like the 
container placed.  In fact it actually makes three request one for the exact 
node, one for the rack the node is on, and one that is generic and could be 
anywhere.  The scheduler will try to honor those requests in the same order so 
data locality is still considered and generally honored.  Yes there is the 
possibility of back and forth to get a container, but the ApplicationMaster 
generally will try to use all of the
 containers that it is given, even if they are not optimal.

--Bobby Evans

On 1/4/12 10:23 AM, "Ann Pal" <ann_r_...@yahoo.com> wrote:


Hi,
>I am trying to understand more about Hadoop Next Gen Map Reduce and had the 
>following questions based on the following post:
>
>http://developer.yahoo.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/
>
>[1] How does application decide how many containers it needs? The containers 
>are used to store the intermediate result at the map nodes?
>
>[2] During resource allocation, if the resource manager has no mapping between 
>map tasks to resources allocated, how can it properly allocate the right 
>resources. It might end up allocating resources on a node, which does not have 
>data for the map task, and hence is not optimal. In this case the Application 
>Master will have to reject it and request again . There could be considerable 
>back- and- forth between application master and resource manager before it 
>could converge. Is this right?
>
>Thanks!
>
>

Reply via email to