Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop

Thomas Graves Thu, 26 Jul 2012 13:08:30 -0700

+1 for the idea.  I think separating the framework from the MR application
makes sense.


Tom 


On 7/25/12 8:40 PM, "Arun C Murthy" <a...@hortonworks.com> wrote:

> Folks,
> 
> It's been nearly a year since we merged Hadoop YARN into trunk and we have
> made several releases since.
> 
> It's exciting to see various open-source communities (both in the ASF and
> externally) start to explore integration with YARN such as Apache Hama, Apache
> Giraph, Apache S4, Spark etc. This promises to help us realize our hopes of
> making Apache Hadoop a much more general data processing platform (& storage,
> of course) and not tied to MapReduce alone for processing data. Furthermore,
> we already have people contributing interesting prototypes such as
> DistributedShell and PaaS on YARN.
> 
> Given this, I think it would be useful to make YARN a sub-project of Apache
> Hadoop along with Common, HDFS & MapReduce. I believe this would help other
> communities realize that they could consider using YARN as a general-purpose
> resource management layer and help us enhance YARN beyond it's humble
> beginnings. 
> 
> Clearly, YARN and MapReduce are different enough that they can and will
> attract a diverse community.
> 
> I'd like to clarify that this proposal *does not* mean we move the code base
> out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside
> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there
> would be *no changes* to release cycles - YARN would be co-released with
> Common, HDFS & MapReduce.
> 
> Thoughts?
> 
> ----
> 
> What does it mean to the Hadoop developer community?
> 
> # Project dependencies
> 
> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN &
> MapReduce. As today, the dependencies *do not change*:
> - Common is the base
> - HDFS depends only on Common
> - YARN depends only on Common & HDFS
> - MapReduce depends on Common, HDFS & YARN.
> 
> # Jira & Mailing lists
> 
> We would have a separate YARN jira project and a yarn-dev@ mailing list.
> 
> We already use separate MAPREDUCE jira issues for making changes to YARN
> (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce
> ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a
> change.
> 
> # Subversion
> 
> Not much at all! YARN has, since the beginning, been developed with the
> understanding that it is very independent of MapReduce and the code-bases are
> already independent i.e. hadoop-mapreduce-project/hadoop-yarn and
> hadoop-mapreduce-project/hadoop-mapreduce-client.
> 
> Essentially the change would be:
> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn
> ... and the necessary, albeit small, changes to our maven build
> infrastructure.
> 
> # Release Cycles
> 
> No changes.
> 
> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.
> 
> thanks,
> Arun

Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop

Reply via email to