Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Eric Baldeschwieler Wed, 29 Aug 2012 10:43:24 -0700

Hi Tom,

> There are also Hadoop tools like distcp, Hadoop archives, Streaming,
> etc, which should go with MapReduce.


Good point.  I agree.

> The alternative would be to have a Common TLP,
> which we shouldn't necessarily dismiss, since more important than the
> size of the codebase is that there's a community to support the
> codebase, as there certainly is here. 


I guess the question is who would want to be on that project?  I don't think 
the current bundle of stuff in common would form a good kernel for a community. 
 A lack of a coherent community for common has always been a problem with the 
project split IMO.  I could see folks deciding that they were going to build a 
community around a really good RPC stack, or some other chunk of common, but 
frankly I think it it premature to do that.  Proposals welcome of course, but I 
think the HDFS folks will want a copy of the RPC stuff in their project and 
most of the rest of the stuff in common is too small to merit a project and is 
more easily handled via duplication and then sorting it out / dead code 
elimination.

On Aug 29, 2012, at 10:30 AM, Tom White wrote:

> On Wed, Aug 29, 2012 at 5:31 PM, Arun C Murthy <a...@hortonworks.com> wrote:
>> 
>> On Aug 28, 2012, at 8:50 PM, Alejandro Abdelnur wrote:
>> 
>>> Chris, thanks for initiating the discussion.
>> 
>> Likewise, thanks Chris!
>> 
>>> 
>>> IMO a pre-requisite to this is to figure out how we'll handle the following:
>>> 
>> 
>> 
>> Good points - I'd recommend we keep Common and HDFS in the same project.
> 
> That seems reasonable. The alternative would be to have a Common TLP,
> which we shouldn't necessarily dismiss, since more important than the
> size of the codebase is that there's a community to support the
> codebase, as there certainly is here. Having said that, a Common TLP
> lacks a clear 'mission' since it doesn't offer any standalone
> services. Also, it may diminish in utility over time if pieces are
> moved into HDFS, MapReduce and YARN.
> 
>> Yes, MR/YARN will need some changes in Common occasionally, but core pieces 
>> like RPC have been maintained by HDFS folks over time anyway e.g. move to 
>> ProtoBufs were led by Sanjay, Suresh, Todd, Jitendra et al.
> 
> Does the work to use versioned protocol buffers for RPC mean that
> different releases of HDFS and MapReduce can work together yet? If
> not, this is something we should be working towards (although that
> shouldn't block a move to TLPs).
> 
>> 
>> We can move SequenceFile into MR if necessary and keep same package names 
>> for compatibility.
> 
> There are also Hadoop tools like distcp, Hadoop archives, Streaming,
> etc, which should go with MapReduce.
> 
> Cheers,
> Tom
> 
>> 
>> We should, of course, stop tweaking things in different projects in the same 
>> jira - we've been reasonably good at not doing that.
>> 
>> Thoughts?
>> 
>> Arun
>> 
>>> * Where does common stuff lives?
>>> * What are the public interfaces of each project (towards the other 
>>> projects)?
>>> * How do we do development/releases? In tandem? Separate? How this
>>> will work in practice, currently we are constantly tweaking things
>>> inter-projects, sometimes in the same JIRAs, sometimes in follow up
>>> JIRAs.
>>> 
>>> Thoughts?
>>> 
>>> Thxs.
>>> 
>>> On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J)
>>> <chris.a.mattm...@jpl.nasa.gov> wrote:
>>>> [decided to minimize traffic and to simply put this in one thread]
>>>> 
>>>> Hi Guys,
>>>> 
>>>> See the recent discussion on these threads:
>>>> 
>>>> YARN as its own Hadoop "sub project": http://s.apache.org/WW1
>>>> Maintain a single committer list for the Hadoop project: 
>>>> http://s.apache.org/Owx
>>>> 
>>>> ...and just pay attention to the Hadoop project over the last 3-4 years. 
>>>> It's operating
>>>> as a single project, that's masking separate communities that themselves 
>>>> are really
>>>> separate ASF projects.
>>>> 
>>>> At the ASF, this has been a problem area called "umbrella" projects and 
>>>> over the years,
>>>> all I've seen from them is wasted bandwidth, artificial barriers and the 
>>>> inventions of
>>>> new ways to perform process mongering and to reduce the fun in developing 
>>>> software
>>>> at this fantastic foundation.
>>>> 
>>>> I've talked about umbrella projects enough. We've diverted conversation 
>>>> enough.
>>>> Enough people have tried to act like there is some technical mumbo jumbo 
>>>> that is
>>>> preventing the eventual act of higher power that I myself hope comes 
>>>> should these
>>>> discussions prove unfruitful through normal means.
>>>> 
>>>> *these. are. separate. projects.*
>>>> *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities*
>>>> 
>>>> In this email: http://s.apache.org/rSm
>>>> 
>>>> And in the 2 subsequent follow ons in that thread, I've outlined a process 
>>>> that I'll copy
>>>> through below for splitting these projects into their own TLPs:
>>>> 
>>>> -----snip
>>>> Process:
>>>> 
>>>> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 
>>>> below, potentially draft resolution too.
>>>> 
>>>> 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt 
>>>> PMC==C. See reasons I've
>>>> already discussed.
>>>> 
>>>> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be 
>>>> discussed and consensus
>>>> can be reached (just a thought experiment). VOTE if necessary.
>>>> 
>>>> 3. [VOTE] thread for <TLP name>
>>>> 
>>>> 4. Create Project:
>>>> a. paste resolution from #0 to board@ or;
>>>> b. go to general@incubator and start new Incubator project.
>>>> 
>>>> 5. infrastructure set up.
>>>>  MLs moving; new UNIX groups; website setup;
>>>>  SVN setup like this:
>>>> 
>>>> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ 
>>>> https://svn.apache.org/repos/asf/<insert cool MR name>; or
>>>> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ 
>>>> https://svn.apache.org/repos/asf/<insert cool YARN name>; or
>>>> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ 
>>>> https://svn.apache.org/repos/asf/<insert cool HDFS name>
>>>> 
>>>> After all 3 have been created run:
>>>> 
>>>> svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." 
>>>> https://svn.apache.org/repos/asf/hadoop
>>>> 
>>>> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as 
>>>> distinct communities, and try to solve the code duplication/dependency
>>>> issues from there.
>>>> 
>>>> 7. If 4b; then graduate as TLP from Incubator.
>>>> 
>>>> -----snip
>>>> 
>>>> So that's my proposal.
>>>> 
>>>> Thanks guys.
>>>> 
>>>> Cheers,
>>>> Chris
>>>> 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Chris Mattmann, Ph.D.
>>>> Senior Computer Scientist
>>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 171-266B, Mailstop: 171-246
>>>> Email: chris.a.mattm...@nasa.gov
>>>> WWW:   http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> Adjunct Assistant Professor, Computer Science Department
>>>> University of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Alejandro
>> 
>> --
>> Arun C. Murthy
>> Hortonworks Inc.
>> http://hortonworks.com/
>> 
>>

Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project

Reply via email to