Hey Arun,

On Jul 25, 2012, at 7:11 PM, Arun C Murthy wrote:

> Hi Chris,
> 
> On Jul 25, 2012, at 7:03 PM, Mattmann, Chris A (388J) wrote:
> 
>> Hi Arun,
>> 
>> IMHO, it sounds like you guys might be better off proposing a new project 
>> for the Apache Incubator.
>> Looking at the things you list below the ---, it looks like an Incubator 
>> proposal minus the initial committer
>> list, and affiliations and mentors/champions ;)
>> 
> 
> Fair point, thanks for chiming in Chris. However, I think we should revisit 
> that when everything in Apache Hadoop (Common, HDFS, YARN & MapReduce) can 
> fly out of the nest as separate projects.

Yep the way I've seen them managed, IMHO, they should be separate projects.

> That, I think, is too early and also that keeping Common, HDFS, YARN & 
> MapReduce together has value in ensuring that Hadoop continues to move along 
> at a fair clip.

I realize I'm asking a hard question here: why *aren't* they separate projects? 
What's the barrier? They seem
to be operating that way (and have been for a while). And I don't see how 
Hadoop still couldnt' move along at
a fair clip with them as official TLPs themselves.

> 
>> If you don't want to go to that level, I don't think you guys need anyone's 
>> permission, and/or etc., right?
>> If YARN is a product of the Apache Hadoop PMC, you guys, as the PMC, can 
>> develop it and evolve it
>> (it = the software and the community) how you guys see fit.
>> 
> 
> Agreed. Which is why I'm trying to gather consensus among the Hadoop 
> community.

Yeah I know you are doing great -- my point is, technically, what consensus is 
required -- you develop code at Apache
as individuals -- code is committed -- as are patches, etc. The PMC is there to 
regulate that, but it sounds like code wise
you are proposing an svn mv command -- do you need an email thread to discuss 
that? Why not just do it, and if someone
has a problem, *then* discuss? Dunno, that's just my opinion.

The things that you are proposing that are new (e.g., mailing lists) will serve 
to splinter (at least the discussion in) the community IMHO -- 
this is spoken from experience in 2 situations (Nutch, Lucene) where we had an 
umbrella projects with tons of virtual "sub projects" that 
in the end have thrived as their own individual projects. if you are going to 
go that far, why not create a new Incubator project and just do 
it clean from the start? 

Cheers,
Chris

> 
> 
>> Cheers,
>> Chris
>> 
>> 
>> On Jul 25, 2012, at 6:40 PM, Arun C Murthy wrote:
>> 
>>> Folks,
>>> 
>>> It's been nearly a year since we merged Hadoop YARN into trunk and we have 
>>> made several releases since.
>>> 
>>> It's exciting to see various open-source communities (both in the ASF and 
>>> externally) start to explore integration with YARN such as Apache Hama, 
>>> Apache Giraph, Apache S4, Spark etc. This promises to help us realize our 
>>> hopes of making Apache Hadoop a much more general data processing platform 
>>> (& storage, of course) and not tied to MapReduce alone for processing data. 
>>> Furthermore, we already have people contributing interesting prototypes 
>>> such as DistributedShell and PaaS on YARN.
>>> 
>>> Given this, I think it would be useful to make YARN a sub-project of Apache 
>>> Hadoop along with Common, HDFS & MapReduce. I believe this would help other 
>>> communities realize that they could consider using YARN as a 
>>> general-purpose resource management layer and help us enhance YARN beyond 
>>> it's humble beginnings. 
>>> 
>>> Clearly, YARN and MapReduce are different enough that they can and will 
>>> attract a diverse community.
>>> 
>>> I'd like to clarify that this proposal *does not* mean we move the code 
>>> base out of hadoop/common/ tree. It just alleviates hadoop-yarn alongside 
>>> hadoop-common, hadoop-hdfs & hadoop-mapreduce in hadoop/trunk. Also, there 
>>> would be *no changes* to release cycles - YARN would be co-released with 
>>> Common, HDFS & MapReduce.
>>> 
>>> Thoughts?
>>> 
>>> ----
>>> 
>>> What does it mean to the Hadoop developer community?
>>> 
>>> # Project dependencies
>>> 
>>> The change is that Hadoop would now have 4 sub-projects: Common, HDFS, YARN 
>>> & MapReduce. As today, the dependencies *do not change*: 
>>> - Common is the base
>>> - HDFS depends only on Common
>>> - YARN depends only on Common & HDFS 
>>> - MapReduce depends on Common, HDFS & YARN.
>>> 
>>> # Jira & Mailing lists
>>> 
>>> We would have a separate YARN jira project and a yarn-dev@ mailing list.
>>> 
>>> We already use separate MAPREDUCE jira issues for making changes to YARN 
>>> (ResourceManager, NodeManager) and to the MapReduce framework (MapReduce 
>>> ApplicationMaster, MapReduce runtime etc.). Hence, this isn't a much of a 
>>> change.
>>> 
>>> # Subversion
>>> 
>>> Not much at all! YARN has, since the beginning, been developed with the 
>>> understanding that it is very independent of MapReduce and the code-bases 
>>> are already independent i.e. hadoop-mapreduce-project/hadoop-yarn and 
>>> hadoop-mapreduce-project/hadoop-mapreduce-client. 
>>> 
>>> Essentially the change would be:
>>> $ svn mv hadoop-mapreduce-project/hadoop-yarn 
>>> hadoop-yarn-project/hadoop-yarn
>>> ... and the necessary, albeit small, changes to our maven build 
>>> infrastructure.
>>> 
>>> # Release Cycles
>>> 
>>> No changes.
>>> 
>>> YARN would be co-released with Common, HDFS & MapReduce, as is the case 
>>> today.
>>> 
>>> thanks,
>>> Arun
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattm...@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
> 
> --
> Arun C. Murthy
> Hortonworks Inc.
> http://hortonworks.com/
> 
> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply via email to