+1 it make more focus and more consistence. Yours, Xuefeng Wu 吴雪峰 敬上
> On 2014年11月6日, at 上午9:31, Matei Zaharia <matei.zaha...@gmail.com> wrote: > > Hi all, > > I wanted to share a discussion we've been having on the PMC list, as well as > call for an official vote on it on a public list. Basically, as the Spark > project scales up, we need to define a model to make sure there is still > great oversight of key components (in particular internal architecture and > public APIs), and to this end I've proposed implementing a maintainer model > for some of these components, similar to other large projects. > > As background on this, Spark has grown a lot since joining Apache. We've had > over 80 contributors/month for the past 3 months, which I believe makes us > the most active project in contributors/month at Apache, as well as over 500 > patches/month. The codebase has also grown significantly, with new libraries > for SQL, ML, graphs and more. > > In this kind of large project, one common way to scale development is to > assign "maintainers" to oversee key components, where each patch to that > component needs to get sign-off from at least one of its maintainers. Most > existing large projects do this -- at Apache, some large ones with this model > are CloudStack (the second-most active project overall), Subversion, and > Kafka, and other examples include Linux and Python. This is also by-and-large > how Spark operates today -- most components have a de-facto maintainer. > > IMO, adopting this model would have two benefits: > > 1) Consistent oversight of design for that component, especially regarding > architecture and API. This process would ensure that the component's > maintainers see all proposed changes and consider them to fit together in a > good way. > > 2) More structure for new contributors and committers -- in particular, it > would be easy to look up who’s responsible for each module and ask them for > reviews, etc, rather than having patches slip between the cracks. > > We'd like to start with in a light-weight manner, where the model only > applies to certain key components (e.g. scheduler, shuffle) and user-facing > APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it > if we deem it useful. The specific mechanics would be as follows: > > - Some components in Spark will have maintainers assigned to them, where one > of the maintainers needs to sign off on each patch to the component. > - Each component with maintainers will have at least 2 maintainers. > - Maintainers will be assigned from the most active and knowledgeable > committers on that component by the PMC. The PMC can vote to add / remove > maintainers, and maintained components, through consensus. > - Maintainers are expected to be active in responding to patches for their > components, though they do not need to be the main reviewers for them (e.g. > they might just sign off on architecture / API). To prevent inactive > maintainers from blocking the project, if a maintainer isn't responding in a > reasonable time period (say 2 weeks), other committers can merge the patch, > and the PMC will want to discuss adding another maintainer. > > If you'd like to see examples for this model, check out the following > projects: > - CloudStack: > https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide > > <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide> > > - Subversion: https://subversion.apache.org/docs/community-guide/roles.html > <https://subversion.apache.org/docs/community-guide/roles.html> > > Finally, I wanted to list our current proposal for initial components and > maintainers. It would be good to get feedback on other components we might > add, but please note that personnel discussions (e.g. "I don't think Matei > should maintain *that* component) should only happen on the private list. The > initial components were chosen to include all public APIs and the main core > components, and the maintainers were chosen from the most active contributors > to those modules. > > - Spark core public API: Matei, Patrick, Reynold > - Job scheduler: Matei, Kay, Patrick > - Shuffle and network: Reynold, Aaron, Matei > - Block manager: Reynold, Aaron > - YARN: Tom, Andrew Or > - Python: Josh, Matei > - MLlib: Xiangrui, Matei > - SQL: Michael, Reynold > - Streaming: TD, Matei > - GraphX: Ankur, Joey, Reynold > > I'd like to formally call a [VOTE] on this model, to last 72 hours. The > [VOTE] will end on Nov 8, 2014 at 6 PM PST. > > Matei --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org