+1 (binding) — Sent from Mailbox
On Thu, Nov 6, 2014 at 6:52 PM, Debasish Das <[email protected]> wrote: > +1 > The app to track PRs based on component is a great idea... > On Thu, Nov 6, 2014 at 8:47 AM, Sean McNamara <[email protected]> > wrote: >> +1 >> >> Sean >> >> On Nov 5, 2014, at 6:32 PM, Matei Zaharia <[email protected]> wrote: >> >> > Hi all, >> > >> > I wanted to share a discussion we've been having on the PMC list, as >> well as call for an official vote on it on a public list. Basically, as the >> Spark project scales up, we need to define a model to make sure there is >> still great oversight of key components (in particular internal >> architecture and public APIs), and to this end I've proposed implementing a >> maintainer model for some of these components, similar to other large >> projects. >> > >> > As background on this, Spark has grown a lot since joining Apache. We've >> had over 80 contributors/month for the past 3 months, which I believe makes >> us the most active project in contributors/month at Apache, as well as over >> 500 patches/month. The codebase has also grown significantly, with new >> libraries for SQL, ML, graphs and more. >> > >> > In this kind of large project, one common way to scale development is to >> assign "maintainers" to oversee key components, where each patch to that >> component needs to get sign-off from at least one of its maintainers. Most >> existing large projects do this -- at Apache, some large ones with this >> model are CloudStack (the second-most active project overall), Subversion, >> and Kafka, and other examples include Linux and Python. This is also >> by-and-large how Spark operates today -- most components have a de-facto >> maintainer. >> > >> > IMO, adopting this model would have two benefits: >> > >> > 1) Consistent oversight of design for that component, especially >> regarding architecture and API. This process would ensure that the >> component's maintainers see all proposed changes and consider them to fit >> together in a good way. >> > >> > 2) More structure for new contributors and committers -- in particular, >> it would be easy to look up who’s responsible for each module and ask them >> for reviews, etc, rather than having patches slip between the cracks. >> > >> > We'd like to start with in a light-weight manner, where the model only >> applies to certain key components (e.g. scheduler, shuffle) and user-facing >> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand >> it if we deem it useful. The specific mechanics would be as follows: >> > >> > - Some components in Spark will have maintainers assigned to them, where >> one of the maintainers needs to sign off on each patch to the component. >> > - Each component with maintainers will have at least 2 maintainers. >> > - Maintainers will be assigned from the most active and knowledgeable >> committers on that component by the PMC. The PMC can vote to add / remove >> maintainers, and maintained components, through consensus. >> > - Maintainers are expected to be active in responding to patches for >> their components, though they do not need to be the main reviewers for them >> (e.g. they might just sign off on architecture / API). To prevent inactive >> maintainers from blocking the project, if a maintainer isn't responding in >> a reasonable time period (say 2 weeks), other committers can merge the >> patch, and the PMC will want to discuss adding another maintainer. >> > >> > If you'd like to see examples for this model, check out the following >> projects: >> > - CloudStack: >> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide >> < >> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide >> > >> > - Subversion: >> https://subversion.apache.org/docs/community-guide/roles.html < >> https://subversion.apache.org/docs/community-guide/roles.html> >> > >> > Finally, I wanted to list our current proposal for initial components >> and maintainers. It would be good to get feedback on other components we >> might add, but please note that personnel discussions (e.g. "I don't think >> Matei should maintain *that* component) should only happen on the private >> list. The initial components were chosen to include all public APIs and the >> main core components, and the maintainers were chosen from the most active >> contributors to those modules. >> > >> > - Spark core public API: Matei, Patrick, Reynold >> > - Job scheduler: Matei, Kay, Patrick >> > - Shuffle and network: Reynold, Aaron, Matei >> > - Block manager: Reynold, Aaron >> > - YARN: Tom, Andrew Or >> > - Python: Josh, Matei >> > - MLlib: Xiangrui, Matei >> > - SQL: Michael, Reynold >> > - Streaming: TD, Matei >> > - GraphX: Ankur, Joey, Reynold >> > >> > I'd like to formally call a [VOTE] on this model, to last 72 hours. The >> [VOTE] will end on Nov 8, 2014 at 6 PM PST. >> > >> > Matei >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [email protected] >> For additional commands, e-mail: [email protected] >> >>
