Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Mattmann, Chris A (388J) Wed, 29 Aug 2012 07:15:14 -0700

Hi Alejandro,

On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote:


> Chris, thanks for initiating the discussion.

No probs!

> 
> IMO a pre-requisite to this is to figure out how we'll handle the following:
> 

To be honest, I don't think any of the below are prereqs. They are technical
issues that can be dealt with post facto of just SVN copy'ing hadoop as it 
stands today per my SVN commands into each of the new TLPs and then
using that as a starting point for doing the below, as part of the natural 
evolution
of the project code. 

That being said, if I had to guess what the TLPs would do to address the below
once they are created:

> * Where does common stuff lives?

This usually happens over time and depending on how often things release, 
and other things cited else-threads, and else-discussions over the past years
in Hadoop. You guys clearly have a good handle on things like this. 

I would just encourage the subsequent TLPs to not worry about doing everything
perfectly and to realize that if you start out with the same code base, you can 
selectively
and then iteratively just make things more clean, refactored, and the answer to 
questions
like this will happen naturally during that evolution.

> * What are the public interfaces of each project (towards the other projects)?

This is something that each distinct community can answer once they are 
bootstrapped
as TLPs. You can decide what portion of the code is really under charter and 
then work
as a community to figure this out. Sorry I can't be more specific than that.

> * How do we do development/releases? In tandem? Separate?

In tandem across communities never really works. Releases should occur 
separately, per
community and TLP, on their own schedule. Code that depends on other projects 
either
has to wait for those communities/TLPs/projects to fix things, or add new 
features, or 
whatever, or insulate, and keep the fixes locally in your project's SVN until 
those fixes
can be pushed upstream, and included in the other communities releases, etc.

Ask yourself this. If you guys have a dependency on e.g., Tomcat, and there is 
some critical
bug or new feature you want in Tomcat, how would you deal with that? I would 
posit the same
way that you could deal with this situation. Keep the fix to Tomcat locally in 
your project; 
work to get that fix upstream and included in some subsequent Tomcat release, 
etc.

> How this
> will work in practice, currently we are constantly tweaking things
> inter-projects, sometimes in the same JIRAs, sometimes in follow up
> JIRAs.

Technically you are doing that that, but community wise, it's not working out, 
and hasn't
really been working for years. I've been around Hadoop since its inception (I 
was a Nutch
committer before Hadoop existed), and though it's been hugely successful, and 
really 
awesome and super great (congrats, everyone, BTW!), the community issues have 
always
cropped up b/c it's one big huge umbrella project and that doesn't work at 
Apache.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattm...@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Reply via email to