Re: [VOTE] Designating maintainers for some Spark components

Josh Rosen Thu, 06 Nov 2014 10:03:12 -0800

+1 (binding).

(our pull request browsing tool is open-source, by the way; contributions
welcome: https://github.com/databricks/spark-pr-dashboard)


On Thu, Nov 6, 2014 at 9:28 AM, Nick Pentreath <[email protected]>
wrote:

> +1 (binding)
>
> —
> Sent from Mailbox
>
> On Thu, Nov 6, 2014 at 6:52 PM, Debasish Das <[email protected]>
> wrote:
>
> > +1
> > The app to track PRs based on component is a great idea...
> > On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara <
> [email protected]>
> > wrote:
> >> +1
> >>
> >> Sean
> >>
> >> On Nov 5, 2014, at 6:32 PM, Matei Zaharia <[email protected]>
> wrote:
> >>
> >> > Hi all,
> >> >
> >> > I wanted to share a discussion we've been having on the PMC list, as
> >> well as call for an official vote on it on a public list. Basically, as
> the
> >> Spark project scales up, we need to define a model to make sure there is
> >> still great oversight of key components (in particular internal
> >> architecture and public APIs), and to this end I've proposed
> implementing a
> >> maintainer model for some of these components, similar to other large
> >> projects.
> >> >
> >> > As background on this, Spark has grown a lot since joining Apache.
> We've
> >> had over 80 contributors/month for the past 3 months, which I believe
> makes
> >> us the most active project in contributors/month at Apache, as well as
> over
> >> 500 patches/month. The codebase has also grown significantly, with new
> >> libraries for SQL, ML, graphs and more.
> >> >
> >> > In this kind of large project, one common way to scale development is
> to
> >> assign "maintainers" to oversee key components, where each patch to that
> >> component needs to get sign-off from at least one of its maintainers.
> Most
> >> existing large projects do this -- at Apache, some large ones with this
> >> model are CloudStack (the second-most active project overall),
> Subversion,
> >> and Kafka, and other examples include Linux and Python. This is also
> >> by-and-large how Spark operates today -- most components have a de-facto
> >> maintainer.
> >> >
> >> > IMO, adopting this model would have two benefits:
> >> >
> >> > 1) Consistent oversight of design for that component, especially
> >> regarding architecture and API. This process would ensure that the
> >> component's maintainers see all proposed changes and consider them to
> fit
> >> together in a good way.
> >> >
> >> > 2) More structure for new contributors and committers -- in
> particular,
> >> it would be easy to look up who’s responsible for each module and ask
> them
> >> for reviews, etc, rather than having patches slip between the cracks.
> >> >
> >> > We'd like to start with in a light-weight manner, where the model only
> >> applies to certain key components (e.g. scheduler, shuffle) and
> user-facing
> >> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can
> expand
> >> it if we deem it useful. The specific mechanics would be as follows:
> >> >
> >> > - Some components in Spark will have maintainers assigned to them,
> where
> >> one of the maintainers needs to sign off on each patch to the component.
> >> > - Each component with maintainers will have at least 2 maintainers.
> >> > - Maintainers will be assigned from the most active and knowledgeable
> >> committers on that component by the PMC. The PMC can vote to add /
> remove
> >> maintainers, and maintained components, through consensus.
> >> > - Maintainers are expected to be active in responding to patches for
> >> their components, though they do not need to be the main reviewers for
> them
> >> (e.g. they might just sign off on architecture / API). To prevent
> inactive
> >> maintainers from blocking the project, if a maintainer isn't responding
> in
> >> a reasonable time period (say 2 weeks), other committers can merge the
> >> patch, and the PMC will want to discuss adding another maintainer.
> >> >
> >> > If you'd like to see examples for this model, check out the following
> >> projects:
> >> > - CloudStack:
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >> <
> >>
> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
> >> >
> >> > - Subversion:
> >> https://subversion.apache.org/docs/community-guide/roles.html <
> >> https://subversion.apache.org/docs/community-guide/roles.html>
> >> >
> >> > Finally, I wanted to list our current proposal for initial components
> >> and maintainers. It would be good to get feedback on other components we
> >> might add, but please note that personnel discussions (e.g. "I don't
> think
> >> Matei should maintain *that* component) should only happen on the
> private
> >> list. The initial components were chosen to include all public APIs and
> the
> >> main core components, and the maintainers were chosen from the most
> active
> >> contributors to those modules.
> >> >
> >> > - Spark core public API: Matei, Patrick, Reynold
> >> > - Job scheduler: Matei, Kay, Patrick
> >> > - Shuffle and network: Reynold, Aaron, Matei
> >> > - Block manager: Reynold, Aaron
> >> > - YARN: Tom, Andrew Or
> >> > - Python: Josh, Matei
> >> > - MLlib: Xiangrui, Matei
> >> > - SQL: Michael, Reynold
> >> > - Streaming: TD, Matei
> >> > - GraphX: Ankur, Joey, Reynold
> >> >
> >> > I'd like to formally call a [VOTE] on this model, to last 72 hours.
> The
> >> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
> >> >
> >> > Matei
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> >> For additional commands, e-mail: [email protected]
> >>
> >>
>

Re: [VOTE] Designating maintainers for some Spark components

Reply via email to