+1 2014-11-05 18:08 GMT-08:00 Patrick Wendell <pwend...@gmail.com>:
> I'm a +1 on this as well, I think it will be a useful model as we > scale the project in the future and recognizes some informal process > we have now. > > To respond to Sandy's comment: for changes that fall in between the > component boundaries or are straightforward, my understanding of this > model is you wouldn't need an explicit sign off. I think this is why > unlike some other projects, we wouldn't e.g. lock down permissions to > portions of the source tree. If some obvious fix needs to go in, > people should just merge it. > > - Patrick > > On Wed, Nov 5, 2014 at 5:57 PM, Sandy Ryza <sandy.r...@cloudera.com> > wrote: > > This seems like a good idea. > > > > An area that wasn't listed, but that I think could strongly benefit from > > maintainers, is the build. Having consistent oversight over Maven, SBT, > > and dependencies would allow us to avoid subtle breakages. > > > > Component maintainers have come up several times within the Hadoop > project, > > and I think one of the main reasons the proposals have been rejected is > > that, structurally, its effect is to slow down development. As you > > mention, this is somewhat mitigated if being a maintainer leads > committers > > to take on more responsibility, but it might be worthwhile to draw up > more > > specific ideas on how to combat this? E.g. do obvious changes, doc > fixes, > > test fixes, etc. always require a maintainer? > > > > -Sandy > > > > On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust <mich...@databricks.com > > > > wrote: > > > >> +1 (binding) > >> > >> On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <matei.zaha...@gmail.com> > >> wrote: > >> > >> > BTW, my own vote is obviously +1 (binding). > >> > > >> > Matei > >> > > >> > > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <matei.zaha...@gmail.com> > >> > wrote: > >> > > > >> > > Hi all, > >> > > > >> > > I wanted to share a discussion we've been having on the PMC list, as > >> > well as call for an official vote on it on a public list. Basically, > as > >> the > >> > Spark project scales up, we need to define a model to make sure there > is > >> > still great oversight of key components (in particular internal > >> > architecture and public APIs), and to this end I've proposed > >> implementing a > >> > maintainer model for some of these components, similar to other large > >> > projects. > >> > > > >> > > As background on this, Spark has grown a lot since joining Apache. > >> We've > >> > had over 80 contributors/month for the past 3 months, which I believe > >> makes > >> > us the most active project in contributors/month at Apache, as well as > >> over > >> > 500 patches/month. The codebase has also grown significantly, with new > >> > libraries for SQL, ML, graphs and more. > >> > > > >> > > In this kind of large project, one common way to scale development > is > >> to > >> > assign "maintainers" to oversee key components, where each patch to > that > >> > component needs to get sign-off from at least one of its maintainers. > >> Most > >> > existing large projects do this -- at Apache, some large ones with > this > >> > model are CloudStack (the second-most active project overall), > >> Subversion, > >> > and Kafka, and other examples include Linux and Python. This is also > >> > by-and-large how Spark operates today -- most components have a > de-facto > >> > maintainer. > >> > > > >> > > IMO, adopting this model would have two benefits: > >> > > > >> > > 1) Consistent oversight of design for that component, especially > >> > regarding architecture and API. This process would ensure that the > >> > component's maintainers see all proposed changes and consider them to > fit > >> > together in a good way. > >> > > > >> > > 2) More structure for new contributors and committers -- in > particular, > >> > it would be easy to look up who's responsible for each module and ask > >> them > >> > for reviews, etc, rather than having patches slip between the cracks. > >> > > > >> > > We'd like to start with in a light-weight manner, where the model > only > >> > applies to certain key components (e.g. scheduler, shuffle) and > >> user-facing > >> > APIs (MLlib, GraphX, etc). Over time, as the project grows, we can > expand > >> > it if we deem it useful. The specific mechanics would be as follows: > >> > > > >> > > - Some components in Spark will have maintainers assigned to them, > >> where > >> > one of the maintainers needs to sign off on each patch to the > component. > >> > > - Each component with maintainers will have at least 2 maintainers. > >> > > - Maintainers will be assigned from the most active and > knowledgeable > >> > committers on that component by the PMC. The PMC can vote to add / > remove > >> > maintainers, and maintained components, through consensus. > >> > > - Maintainers are expected to be active in responding to patches for > >> > their components, though they do not need to be the main reviewers for > >> them > >> > (e.g. they might just sign off on architecture / API). To prevent > >> inactive > >> > maintainers from blocking the project, if a maintainer isn't > responding > >> in > >> > a reasonable time period (say 2 weeks), other committers can merge the > >> > patch, and the PMC will want to discuss adding another maintainer. > >> > > > >> > > If you'd like to see examples for this model, check out the > following > >> > projects: > >> > > - CloudStack: > >> > > >> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide > >> > < > >> > > >> > https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide > >> > > > >> > > - Subversion: > >> > https://subversion.apache.org/docs/community-guide/roles.html < > >> > https://subversion.apache.org/docs/community-guide/roles.html> > >> > > > >> > > Finally, I wanted to list our current proposal for initial > components > >> > and maintainers. It would be good to get feedback on other components > we > >> > might add, but please note that personnel discussions (e.g. "I don't > >> think > >> > Matei should maintain *that* component) should only happen on the > private > >> > list. The initial components were chosen to include all public APIs > and > >> the > >> > main core components, and the maintainers were chosen from the most > >> active > >> > contributors to those modules. > >> > > > >> > > - Spark core public API: Matei, Patrick, Reynold > >> > > - Job scheduler: Matei, Kay, Patrick > >> > > - Shuffle and network: Reynold, Aaron, Matei > >> > > - Block manager: Reynold, Aaron > >> > > - YARN: Tom, Andrew Or > >> > > - Python: Josh, Matei > >> > > - MLlib: Xiangrui, Matei > >> > > - SQL: Michael, Reynold > >> > > - Streaming: TD, Matei > >> > > - GraphX: Ankur, Joey, Reynold > >> > > > >> > > I'd like to formally call a [VOTE] on this model, to last 72 hours. > The > >> > [VOTE] will end on Nov 8, 2014 at 6 PM PST. > >> > > > >> > > Matei > >> > > >> > > >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >