Hi Chris and all, Thanks for initiating the discussion. Can I say something in a prospective of contributor but not a committer or PMC member? First, I have a feeling that current hadoop project process is good for contributors to deliver a bug fix but not so easy to deliver a big feature. I have great experience in bug fixing work that can get quickly response from committers and checked in. However, I feel a little frustrated in delivering a feature (~5K LOC, very important for hadoop running well on virtualization infrastructure) across common, hdfs, map reduce and yarn. Firstly, you have to figure out different committers you should turn for help on each component, then convince them your ideas and work with them in reviewing and committing the code. Each committers should understand the completed story and learn the code pending on review as well as that already checked in. If some committers are super busy, then the feature looks like pending forever. Thus, due to my current experience, I may have to say this process is not so friendly to contributors who come from different organizations with different backgrounds but have the same wish to contribute more to Apache hadoop. Based on this, for spinning out hadoop sub-project to TLPs, I would glad to see we will have concisely committer list for each projects then committers can be more focus (more bandwidth may be?) and contributors can know who they should turn to get quick response and help there. On the other hand, I would concern it may take more complexity to dependencies for features that across sub-project today as you should figure out branches for each TLP but it is hard to estimate when code can come alive in each branch of TLP (may take the similar complexity to committers as well). I don't have many good suggestions but would be glad to see the process can be more smoothly for contributor's work no matter what decision we are making today. Just 2 cents.
Thanks, Junping ----- Original Message ----- From: "Chris A Mattmann (388J)" <chris.a.mattm...@jpl.nasa.gov> To: general@hadoop.apache.org Sent: Tuesday, August 28, 2012 7:33:58 PM Subject: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project [decided to minimize traffic and to simply put this in one thread] Hi Guys, See the recent discussion on these threads: YARN as its own Hadoop "sub project": http://s.apache.org/WW1 Maintain a single committer list for the Hadoop project: http://s.apache.org/Owx ...and just pay attention to the Hadoop project over the last 3-4 years. It's operating as a single project, that's masking separate communities that themselves are really separate ASF projects. At the ASF, this has been a problem area called "umbrella" projects and over the years, all I've seen from them is wasted bandwidth, artificial barriers and the inventions of new ways to perform process mongering and to reduce the fun in developing software at this fantastic foundation. I've talked about umbrella projects enough. We've diverted conversation enough. Enough people have tried to act like there is some technical mumbo jumbo that is preventing the eventual act of higher power that I myself hope comes should these discussions prove unfruitful through normal means. *these. are. separate. projects.* *there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.own.communities* In this email: http://s.apache.org/rSm And in the 2 subsequent follow ons in that thread, I've outlined a process that I'll copy through below for splitting these projects into their own TLPs: -----snip Process: 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 below, potentially draft resolution too. 1. Decide on an initial set of *PMC* members. I urge each new TLP to adopt PMC==C. See reasons I've already discussed. 2. Decide on a chair. Try not to VOTE for this explicitly, see if can be discussed and consensus can be reached (just a thought experiment). VOTE if necessary. 3. [VOTE] thread for <TLP name> 4. Create Project: a. paste resolution from #0 to board@ or; b. go to general@incubator and start new Incubator project. 5. infrastructure set up. MLs moving; new UNIX groups; website setup; SVN setup like this: svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool MR name>; or svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool YARN name>; or svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ https://svn.apache.org/repos/asf/<insert cool HDFS name> After all 3 have been created run: svn remove -m "Remove Hadoop umbrella TLP. Split into separate projects." https://svn.apache.org/repos/asf/hadoop 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate as distinct communities, and try to solve the code duplication/dependency issues from there. 7. If 4b; then graduate as TLP from Incubator. -----snip So that's my proposal. Thanks guys. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++