On Jan 5, 2012, at 8:29 AM, Praveen Sripati wrote: > Hi, > > I had been going through the MRv2 documentation and have the following queries > > 1) Let's say that an InputSplit is on Node1 and Node2. > > Can the ApplicationMaster ask the ResourceManager for a container either on > Node1 or Node2 with an OR condition? >
No, the OR condition is implied by the hierarchy of requests (node, rack, *). In this case, assuming topology is node1/rack1 node2/rack1, requests would be: node1 -> 1 node2 -> 1 rack1 -> 1 * -> 1 OTOH, if the topology is node1/rack1, node2/rack2, requests would be: node1 -> 1 node2 -> 1 rack1 -> 1 rack2 -> 1 * -> 1 In both cases, * would limit the #allocated-containers to '1'. In the first case rack1 itself (independent of *) would limit #allocated-containers to 1. More details here: http://developer.yahoo.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/. I'll work on incorporating this into our docs on hadoop.apache.org. > 2) > The Scheduler receives periodic information about the resource usages on > allocated resources from the NodeManagers. The Scheduler also makes available > status of completed Containers to the appropriate ApplicationMaster. > > What's the use of NM sending the resource usages to the scheduler? > > Why can't the NM directly talk to the AM about the completed containers? Does > any information pass from NM to AM? > The NM sends resource usages to the scheduler to allow it to track resource utilization on each node and, in future, make smarter decisions about allocating extra containers on under-utilized nodes etc. NM doesn't make any 'out' calls to anyone by RM, else it would be severe scalability bottleneck. > 3) >The Map-Reduce ApplicationMaster has the following components: > > TaskUmbilical – The component responsible for receiving heartbeats and > > status updates form the map and reduce tasks. > > Does the communication happen directly between the container and the AM? If > yes, the task completion status could also be sent from the container to the > AM. > Yes, it already is sent to AM. > 4) > The Hadoop Map-Reduce JobClient polls the ASM to obtain information > about the MR AM and then directly talks to the AM for status, counters etc. > > Once the Job is completed the AM goes down, what happens to the Counters? > What is the flow of the Counter (Container -> NM -> AM)? > Once jobs completes the Counters etc. are stored in JobHistory file (one per job) which is served up, if necessary, by the JobHistoryServer. > 5) If a new YARN application is created. How can the NM trust the request > from AM? > All interactions (RPCs) are authenticated. Also, there is a container token provided by the RM (during allocation) which is verified by the NM during container launch. > 6) > MapReduce NextGen uses wire-compatible protocols to allow different > versions of servers and clients to communicate with each other. > > What is meant by the `wire-compatible protocols` and how is it implemented? > We use PB everywhere. > 7) > The computation framework (ResourceManager and NodeManager) is > completely generic and is free of MapReduce specificities. > > Is this the reason for adding auxiliary services for shuffling to the NM? > Yes. hth, Arun