I personally am for splitting up the projects. I think there is a lot of potential that each of the projects could have on their own, and I expect to see them evolve in new and interesting ways when the projects are not tied directly together.
But, in order to get there we need to address the issues that made the first split attempt fail. First off we need look at all API calls that MR, YARN, or HDFS do into common that are not @Stable, and either promote them to @Stable or remove the need for those calls. Second while we are doing that we need to look at the visibility of those APIs. How many APIs really need to be @LimitedPrivate or should they be @Public? How many of the APIs have no designation at all? Third get truly serious about maintaining binary compatibility on @Stable APIs. Fourth we need to start splitting the projects up, starting with common. I think it would be cool to call it liBig, but I digress. Once common has been split out and is on its own for a few releases, we start splitting out HDFS, YARN, and MapReduce. For each of those we need to do a similar audit between the projects and fix the interdependencies between them. This is mostly dependencies between YARN and MR. As part of this we also need to have a clear set of rules about what it takes to become a committer or PMC member for the new projects when they split off. I am fine with all committers become PMC members, but if we merge the lists now and simply say all pervious committers become committers on the new TLPs there will be a lot of committers/PMC members that have no real desire to be on those projects. I would propose that we merge the committer lists, but all committers on the current project receive an invitation to become a committer on the new projects. ATM convinced me that committers know their boundaries and will self censor. I believe that many committers will decline to become committers on the new projects either because it is out of their area of experteese or because they are not involved with Hadoop any more, and will ignore the invitation. I fear that just voting and doing an svn copy -m will result in the same thing that happened last time. Someone will want to make a large change. This will require making a change to something in common, but because it cannot easily be done in a backwards compatible way, or it will take three steps to complete the change instead of one we will get frustrated. If this happens enough we will really get frustrated and try to merge the projects back together again. This is because the projects are too tightly coupled together right now to really have them stand on their own. Just look at all of the security and token work that has been done recently. They have touched every single project and it has been a bit of a nightmare. It would be even worse if the projects were completely split apart. I also want us to think about the timing of this. Do we really want to do this before 2.0 is GA? Doing this properly is probably going to be a several month effort for one or two people, and a concerted effort by everyone not to break things while they work. If we have to rearchitect something so that the APIs can be marked stable it may be a lot longer then that. Is it worth pushing the GA of 2.0 off by an entire quarter? For me I would say yes, but I know others have different opinions, and different schedules. @Chris, I can see your desire to do the split now, and then deal with the fallout as we adapt to the changes. I think that would work assuming that we all are completely committed to making the changes necessary. But because we are having this discussion at all seems to indicate that we are not all completely committed to this, and I also feel that dealing with the fallout is going to take a lot longer if we don't try to address some of the problems up front. Putting on my Yahoo! Hat, I want to avoid as many problems and delays as I can, because my customers want a stable release of Hadoop the features that are in 2.0. The longer it is delayed the longer we stay on branch-0.23. A one quarter delay because of this I am sure I can swing, more then that and I will start to get more pressure to pull in new features which will probably mean that we then have to fork which is something that I really do not want to do. So I am +1 on merging the committer list, and +1 splitting the projects. I would encourage us to at least do some planning and legwork up front before splitting. I am even +1 for setting a deadline on which date svn -m will happen wether we are ready or not. --Bobby Evans On 8/28/12 10:50 PM, "Alejandro Abdelnur" <t...@cloudera.com> wrote: >Chris, thanks for initiating the discussion. > >IMO a pre-requisite to this is to figure out how we'll handle the >following: > >* Where does common stuff lives? >* What are the public interfaces of each project (towards the other >projects)? >* How do we do development/releases? In tandem? Separate? How this >will work in practice, currently we are constantly tweaking things >inter-projects, sometimes in the same JIRAs, sometimes in follow up >JIRAs. > >Thoughts? > >Thxs. > >On Tue, Aug 28, 2012 at 7:33 PM, Mattmann, Chris A (388J) ><chris.a.mattm...@jpl.nasa.gov> wrote: >> [decided to minimize traffic and to simply put this in one thread] >> >> Hi Guys, >> >> See the recent discussion on these threads: >> >> YARN as its own Hadoop "sub project": http://s.apache.org/WW1 >> Maintain a single committer list for the Hadoop project: >>http://s.apache.org/Owx >> >> ...and just pay attention to the Hadoop project over the last 3-4 >>years. It's operating >> as a single project, that's masking separate communities that >>themselves are really >> separate ASF projects. >> >> At the ASF, this has been a problem area called "umbrella" projects and >>over the years, >> all I've seen from them is wasted bandwidth, artificial barriers and >>the inventions of >> new ways to perform process mongering and to reduce the fun in >>developing software >> at this fantastic foundation. >> >> I've talked about umbrella projects enough. We've diverted conversation >>enough. >> Enough people have tried to act like there is some technical mumbo >>jumbo that is >> preventing the eventual act of higher power that I myself hope comes >>should these >> discussions prove unfruitful through normal means. >> >> *these. are. separate. projects.* >> >>*there.are.not.blocker.issues.from.spinning.out.these.projects.as.their.o >>wn.communities* >> >> In this email: http://s.apache.org/rSm >> >> And in the 2 subsequent follow ons in that thread, I've outlined a >>process that I'll copy >> through below for splitting these projects into their own TLPs: >> >> -----snip >> Process: >> >> 0. [DISCUSS] thread for <TLP name> in which you talk about #1 and #2 >>below, potentially draft resolution too. >> >> 1. Decide on an initial set of *PMC* members. I urge each new TLP to >>adopt PMC==C. See reasons I've >> already discussed. >> >> 2. Decide on a chair. Try not to VOTE for this explicitly, see if can >>be discussed and consensus >> can be reached (just a thought experiment). VOTE if necessary. >> >> 3. [VOTE] thread for <TLP name> >> >> 4. Create Project: >> a. paste resolution from #0 to board@ or; >> b. go to general@incubator and start new Incubator project. >> >> 5. infrastructure set up. >> MLs moving; new UNIX groups; website setup; >> SVN setup like this: >> >> svn copy -m "MR TLP." https://svn.apache.org/repos/asf/hadoop/ >>https://svn.apache.org/repos/asf/<insert cool MR name>; or >> svn copy -m "YARN TLP." https://svn.apache.org/repos/asf/hadoop/ >>https://svn.apache.org/repos/asf/<insert cool YARN name>; or >> svn copy -m "HDFS TLP." https://svn.apache.org/repos/asf/hadoop/ >>https://svn.apache.org/repos/asf/<insert cool HDFS name> >> >> After all 3 have been created run: >> >> svn remove -m "Remove Hadoop umbrella TLP. Split into separate >>projects." https://svn.apache.org/repos/asf/hadoop >> >> 6. (TLPs if 4a; Incubator podling if 4b;) proceed, collaborate, operate >>as distinct communities, and try to solve the code duplication/dependency >> issues from there. >> >> 7. If 4b; then graduate as TLP from Incubator. >> >> -----snip >> >> So that's my proposal. >> >> Thanks guys. >> >> Cheers, >> Chris >> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Chris Mattmann, Ph.D. >> Senior Computer Scientist >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA >> Office: 171-266B, Mailstop: 171-246 >> Email: chris.a.mattm...@nasa.gov >> WWW: http://sunset.usc.edu/~mattmann/ >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> Adjunct Assistant Professor, Computer Science Department >> University of Southern California, Los Angeles, CA 90089 USA >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ >> > > > >-- >Alejandro