Hi Mahadev,

Thanks, they are both very helpful to understand the architecture of YARN.

What we are looking for, is more of the difference at the task level.
Suppose a map task takes 10 minutes in Hadoop, then we have a model to
analyse what makes up the 10 minutes, e.g. reading from HDFS, invoking the
map function, writing to the buffer, partitioning, sorting and merging.
This model can be used to identify the bottleneck of the task execution and
suggest better configurations.

If we run MR jobs in YARN, can we use the same model to analyse the running
time of a task? One possible difference I've noticed so far is that the
shuffling has become a service of the node manager. Any other change
related to the map phase or reduce phase?

Thanks,
Jie

On Mon, Jan 16, 2012 at 4:32 PM, Mahadev Konar <[email protected]>wrote:

> Hi Jie,
>  You might want to read through:
>
> http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/YARN.html
> and
> http://developer.yahoo.com/blogs/hadoop/posts/2011/02/mapreduce-nextgen/
>
> for more information on the architecture. Itll help you understand the
> major differences between the two.
>
> mahadev
>
> On Mon, Jan 16, 2012 at 11:41 AM, Jie Li <[email protected]> wrote:
> > Hi all,
> >
> > As we know MRv2 (the MapReduce library in YARN) has changed
> significantly.
> > We have a cost model built for the MapReduce in Hadoop and are going to
> > migrate to MRv2. Can anyone give us a pointer to the fundamental
> > differences between them? Also, below are some of my understandings and
> > feel free to correct me.
> >
> > 1. JT has been replaced by a central RM and a per-application AM.
> > 2. TT has been replaced by the NM and the task slots have been replaced
> by
> > the containers. The containers can be allocated dynamically thus both the
> > number and the memory size of the containers can vary on demand.
> > 3. The shuffle service has become independent from the Map.
> >
> > Thanks,
> > Jie
>
>

Reply via email to