Hey Todd, On Aug 29, 2012, at 5:16 PM, Todd Lipcon wrote:
> On Wed, Aug 29, 2012 at 4:54 PM, Mattmann, Chris A (388J) > <chris.a.mattm...@jpl.nasa.gov> wrote: >> >> Please provide examples that show umbrella projects work. > > Hadoop, in its current form? I don't agree that it's working. That's where you and I differ. And not just you and I, you and the others that have agreed with me else-thread. Technically, the project is working for sure. Community-wise, no. I guess we can agree to disagree. > > > If we copy-paste forked Common, we'd be doubling our maintenance work > on this shared code. Who's "we"? You? Would you expect to be a PMC member/committer in all split projects? Also, are you the only person working on the project? And the "we" would include others, right? Who may or may not be committers on the other projects? I'm not proposing SVN copy and then all PMC members x N projects. Figure out who are on the PMCs for the distinct communities that are operating on this hydra. >> >> I don't know what else to tell you. I'm not going to go look up all the >> threads. >> I'm not Google nor do I care to. All I can say is that I've seen it before >> and >> so have others. In your own project. >> > > What's one concrete example of where it would be better if we split? Training off bad community practices is difficult, I'll agree with you on that. Hopefully if these new projects went the Incubator route, you could get some other fuddy duddy's like me that have been around and seen a lot at the Foundation helping the new projects really understand the community aspects. > > To say that all ASF projects should work the same seems pretty bizarre > to me. Please show me where I said the above sentence? > The ASF provides license protection, infrastructure, and a set > of guidelines for what makes successful projects. Guidelines which the Apache Hadoop PMC continues not to follow. Technically successful yes. Community-wise successful, sorta. > But I don't think it > is the foundation's place to dictate what its projects should do "from > above" if the projects themselves do not see a problem. No, but it's the Foundation's (and its members) responsibility to ensure that its projects are behaving in that loosely coupled set of principles and guidelines that we call the Apache way. Apache Hadoop is doing great technically. Not so sure about the Apache way part. > > If the project is so messed up, then maybe some folks should fork it > into the incubator like you've suggested? What's wrong with the > anarchic "let the best project succeed" philosophy, which I've also > heard from Apache? Yeah I proposed that too. We'll see if it happens. Concretely, I think all of the current Hadoop "sub projects" should take a spin through the Incubator and see how they are doing as projects. If nothing is afoul, I'm sure it would be a pretty quick process, right? Add new some PPMC members/committers, make a release or two, make sure all software is ALv2 and compat. You guys are already doing that, right? > >> You still point to arguing to contention -- it's more than that Todd. The >> project's >> policies for inclusivity have nothing to do with arguing about technical >> issues. > > I'm absolutely for meritocracy. I just have a high bar for what should > be considered "merit". Perhaps the PMC as a whole has a high bar. For > a system that stores my data, I'm pretty happy about that. You won't be pretty happy about it when your high bar leaves you as one of the only people int he world maintaining a 100M line code base. Especially as you get older, have kids (or not), have a family, go on to do even bigger and better things, and care even less about reading emails like this. You're going to see eventually (as will others) that the way that you grow around this Foundation (and in software in general) is to teach others how to do your job, and to attract people to your project, and not to shoo them away with exclusivity. You call it a "high bar" to "protect your data". I call it "enjoy maintaining the software forever and never taking a vacation". It's called scalability Todd. > >> >> Dude, you have to do that regardless, that has nothing to do with *Apache >> Hadoop*. >> Take your Cloudera hat off and put your *Apache Software Foundation* hat on. >> Is your >> #1 priority developing software here to stitch code back together, turn it >> into a deliverable >> for your customers (I'm guessing Cloudera customers, right? B/c Apache >> doesn't have >> specific customers?) and to maintain green Jenkins builds? > > Yes? I think so? If we do a bad release and it loses substantial data, > our user base would disappear quite quickly. Of course, because 1 release kills a project right? And of course there weren't 30 some odd releases before that one bad one that someone could roll back to, right? Huh?? > >> >> Also tell me how the 4 SVN commands I suggested will stop you from doing the >> above? >> At Apache? > > If the projects are on separate release schedules, this means that > cross-project changes have to be staged across the projects in such a > way that neither project breaks in the interim. Because this is what happens with Tomcat, or whatever other dependencies you guys have in your modularized project right? You guys call up the Tomcat PMC whenever there is a release and make sure that your Hadoop specific need is included in it right? Or that they include some bug fix that you really need? C'mon, you know that's not the way stuff works. It's called insulation. > > In the absense of a reasonable *technical* strategy to release > independently, and a lot of work to stabilize internal APIs around > security and IPC in particular, doing it again would cause the same > problems it caused the first time. I agree there should be a plan to technically work to make sure the independent TLPs (or podlings->TLPs eventually whatever) sync up or line up -- that would be ideal. What if it doesn't happen? Will the world end? Probably not. Because there are good people hanging around that will get stuff done and make sure new TLP software foo bar technically works great as they have always done. > > It also makes the users' lives much more difficult, or forces them to > only consume via downstream packagers. No it doesn't. That's orthogonal? > Earlier in this thread, you > seemed to think that downstream packagers indicated an issue with the > community Nah, I was talking about downstream "companies" and their interests, not packagers. > : fracturing the releases would only serve to make the ASF > download page even less useful for someone who just wants to get going > fast. Why is that? Isn't that what *Apache* Big Top (incubating) is for (which also has an *Apache* download page?). > >> >> At Cloudera, tell me also how it will stop you? >> > > If the projects were on different release schedules, then we'd be more > likely to have to do a lot of local patching to get stuff to "fit > together" right. +1, this could be the case. > Version compatibility is a difficult problem - it > multiplies the QA matrix, complicates deployment, etc. Yep agree. > It's not > insurmountable, but unless there's something to be gained (what is it, > again, that you think we'd gain, specifically?) I don't see why we'd > take this additional hassle. As for the gain, I think what you'd gain is less arguments about who to add to the PMC, how to add them, less maintenance of lame ASF authorization templates within *the same project*, less meta-discussions, and company politic spillover, and hopefully more beer to be shared by all. Note, I said *I think*. I'm only truly physic sometimes. > >> >> P.S. I appreciate you and am still one of your biggest fans. Just trying to >> help you see the bigger picture here and to wear your Apache hat. > > Thanks for that. As for Apache vs Cloudera hat: I think they're well > aligned here. Both hats want the project to be easy for people to > contribute to, and want to avoid a bunch of wasted time spent on new > technical issues that this would create. I want to spend that time > making the product better, for our users benefit. Whether the users > are Apache community users, or Cloudera customers, or Facebook's data > scientists, they all are going to be happier if I spend a month > improving our HA support compared to spending a month figuring out how > to release three separate projects which somehow stitch together in a > reasonable way at runtime without jar conflicts, tons of duplicate > configuration work, byzantine version dependencies, etc. That's a fair statement Todd. But that's why it's not Apache Todd, or Apache Todooop. And why there are others at the Foundation, that you have to rely on, others within your project that you have to rely on, and why not everyone has the same interests. Some people's interests are in patching HDFS and making it highly scalable and kicking butt technically. Other people's interests are in discussing what they perceive to be community issues within a project at their Foundation. Cheers, Chris ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++