I'm a +1 on this as well, I think it will be a useful model as we
scale the project in the future and recognizes some informal process
we have now.

To respond to Sandy's comment: for changes that fall in between the
component boundaries or are straightforward, my understanding of this
model is you wouldn't need an explicit sign off. I think this is why
unlike some other projects, we wouldn't e.g. lock down permissions to
portions of the source tree. If some obvious fix needs to go in,
people should just merge it.

- Patrick

On Wed, Nov 5, 2014 at 5:57 PM, Sandy Ryza <sandy.r...@cloudera.com> wrote:
> This seems like a good idea.
>
> An area that wasn't listed, but that I think could strongly benefit from
> maintainers, is the build.  Having consistent oversight over Maven, SBT,
> and dependencies would allow us to avoid subtle breakages.
>
> Component maintainers have come up several times within the Hadoop project,
> and I think one of the main reasons the proposals have been rejected is
> that, structurally, its effect is to slow down development.  As you
> mention, this is somewhat mitigated if being a maintainer leads committers
> to take on more responsibility, but it might be worthwhile to draw up more
> specific ideas on how to combat this?  E.g. do obvious changes, doc fixes,
> test fixes, etc. always require a maintainer?
>
> -Sandy
>
> On Wed, Nov 5, 2014 at 5:36 PM, Michael Armbrust <mich...@databricks.com>
> wrote:
>
>> +1 (binding)
>>
>> On Wed, Nov 5, 2014 at 5:33 PM, Matei Zaharia <matei.zaha...@gmail.com>
>> wrote:
>>
>> > BTW, my own vote is obviously +1 (binding).
>> >
>> > Matei
>> >
>> > > On Nov 5, 2014, at 5:31 PM, Matei Zaharia <matei.zaha...@gmail.com>
>> > wrote:
>> > >
>> > > Hi all,
>> > >
>> > > I wanted to share a discussion we've been having on the PMC list, as
>> > well as call for an official vote on it on a public list. Basically, as
>> the
>> > Spark project scales up, we need to define a model to make sure there is
>> > still great oversight of key components (in particular internal
>> > architecture and public APIs), and to this end I've proposed
>> implementing a
>> > maintainer model for some of these components, similar to other large
>> > projects.
>> > >
>> > > As background on this, Spark has grown a lot since joining Apache.
>> We've
>> > had over 80 contributors/month for the past 3 months, which I believe
>> makes
>> > us the most active project in contributors/month at Apache, as well as
>> over
>> > 500 patches/month. The codebase has also grown significantly, with new
>> > libraries for SQL, ML, graphs and more.
>> > >
>> > > In this kind of large project, one common way to scale development is
>> to
>> > assign "maintainers" to oversee key components, where each patch to that
>> > component needs to get sign-off from at least one of its maintainers.
>> Most
>> > existing large projects do this -- at Apache, some large ones with this
>> > model are CloudStack (the second-most active project overall),
>> Subversion,
>> > and Kafka, and other examples include Linux and Python. This is also
>> > by-and-large how Spark operates today -- most components have a de-facto
>> > maintainer.
>> > >
>> > > IMO, adopting this model would have two benefits:
>> > >
>> > > 1) Consistent oversight of design for that component, especially
>> > regarding architecture and API. This process would ensure that the
>> > component's maintainers see all proposed changes and consider them to fit
>> > together in a good way.
>> > >
>> > > 2) More structure for new contributors and committers -- in particular,
>> > it would be easy to look up who's responsible for each module and ask
>> them
>> > for reviews, etc, rather than having patches slip between the cracks.
>> > >
>> > > We'd like to start with in a light-weight manner, where the model only
>> > applies to certain key components (e.g. scheduler, shuffle) and
>> user-facing
>> > APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand
>> > it if we deem it useful. The specific mechanics would be as follows:
>> > >
>> > > - Some components in Spark will have maintainers assigned to them,
>> where
>> > one of the maintainers needs to sign off on each patch to the component.
>> > > - Each component with maintainers will have at least 2 maintainers.
>> > > - Maintainers will be assigned from the most active and knowledgeable
>> > committers on that component by the PMC. The PMC can vote to add / remove
>> > maintainers, and maintained components, through consensus.
>> > > - Maintainers are expected to be active in responding to patches for
>> > their components, though they do not need to be the main reviewers for
>> them
>> > (e.g. they might just sign off on architecture / API). To prevent
>> inactive
>> > maintainers from blocking the project, if a maintainer isn't responding
>> in
>> > a reasonable time period (say 2 weeks), other committers can merge the
>> > patch, and the PMC will want to discuss adding another maintainer.
>> > >
>> > > If you'd like to see examples for this model, check out the following
>> > projects:
>> > > - CloudStack:
>> >
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
>> > <
>> >
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
>> > >
>> > > - Subversion:
>> > https://subversion.apache.org/docs/community-guide/roles.html <
>> > https://subversion.apache.org/docs/community-guide/roles.html>
>> > >
>> > > Finally, I wanted to list our current proposal for initial components
>> > and maintainers. It would be good to get feedback on other components we
>> > might add, but please note that personnel discussions (e.g. "I don't
>> think
>> > Matei should maintain *that* component) should only happen on the private
>> > list. The initial components were chosen to include all public APIs and
>> the
>> > main core components, and the maintainers were chosen from the most
>> active
>> > contributors to those modules.
>> > >
>> > > - Spark core public API: Matei, Patrick, Reynold
>> > > - Job scheduler: Matei, Kay, Patrick
>> > > - Shuffle and network: Reynold, Aaron, Matei
>> > > - Block manager: Reynold, Aaron
>> > > - YARN: Tom, Andrew Or
>> > > - Python: Josh, Matei
>> > > - MLlib: Xiangrui, Matei
>> > > - SQL: Michael, Reynold
>> > > - Streaming: TD, Matei
>> > > - GraphX: Ankur, Joey, Reynold
>> > >
>> > > I'd like to formally call a [VOTE] on this model, to last 72 hours. The
>> > [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>> > >
>> > > Matei
>> >
>> >
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to