Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop

Hitesh Shah Thu, 26 Jul 2012 13:58:37 -0700

+1.

-- Hitesh


On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote:

> Folks,
> 
> It's been nearly a year since we merged Hadoop YARN into trunk and we have 
> made several releases since.
> 
> It's exciting to see various open-source communities (both in the ASF and 
> externally) start to explore integration with YARN such as Apache Hama, 
> Apache Giraph, Apache S4, Spark etc. This promises to help us realize our 
> hopes of making Apache Hadoop a much more general data processing platform (& 
> storage, of course) and not tied to MapReduce alone for processing data. 
> Furthermore, we already have people contributing interesting prototypes such 
> as DistributedShell and PaaS on YARN.
> 
> Given this, I think it would be useful to make YARN a sub-project of Apache 
> Hadoop along with Common, HDFS & MapReduce. I believe this would help other 
> communities realize that they could consider using YARN as a general-purpose 
> resource management layer and help us enhance YARN beyond it's humble 
> beginnings. 
> 
> Clearly, YARN and MapReduce are different enough that they can and will 
> attract a diverse community.
> 
> I'd like to clarify that this proposal *does not* mean we move the code base 
> out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside 
> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there 
> would be *no changes* to release cycles - YARN would be co-released with 
> Common, HDFS & MapReduce.
> 
> Thoughts?
> 
> ----
> 
> What does it mean to the Hadoop developer community?
> 
> # Project dependencies
> 
> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN & 
> MapReduce. As today, the dependencies *do not change*: 
> - Common is the base
> - HDFS depends only on Common
> - YARN depends only on Common & HDFS 
> - MapReduce depends on Common, HDFS & YARN.
> 
> # Jira & Mailing lists
> 
> We would have a separate YARN jira project and a yarn-dev@ mailing list.
> 
> We already use separate MAPREDUCE jira issues for making changes to YARN 
> (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce 
> ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a 
> change.
> 
> # Subversion
> 
> Not much at all! YARN has, since the beginning, been developed with the 
> understanding that it is very independent of MapReduce and the code-bases are 
> already independent i.e. hadoop-mapreduce-project/hadoop-yarn and 
> hadoop-mapreduce-project/hadoop-mapreduce-client. 
> 
> Essentially the change would be:
> $ svn mv hadoop-mapreduce-project/hadoop-yarn hadoop-yarn-project/hadoop-yarn
> ... and the necessary, albeit small, changes to our maven build 
> infrastructure.
> 
> # Release Cycles
> 
> No changes.
> 
> YARN would be co-released with Common, HDFS & MapReduce, as is the case today.
> 
> thanks,
> Arun

Re: [DISCUSS] - YARN as a sub-project of Apache Hadoop

Reply via email to