Re: [VOTE] Designating maintainers for some Spark components

Arun C Murthy Thu, 06 Nov 2014 17:47:50 -0800

With my ASF Member hat on, I fully agree with Greg.

As he points out, this is an anti-pattern in the ASF and is severely frowned 
upon.


We, in Hadoop, had a similar trajectory where we had were politely told to go 
away from having sub-project committers (HDFS, MapReduce etc.) to a common list 
of committers. There were some concerns initially, but we have successfully 
managed to work together and build a more healthy community as a result of 
following the advice on the ASF Way.

I do have sympathy for good oversight etc. as the project grows and attracts 
many contributors - it's essentially the need to have smaller, well-knit 
developer communities. One way to achieve that would be to have separate TLPs  
(e.g. Spark, MLLIB, GraphX) with separate committer lists for each representing 
the appropriate community. Hadoop went a similar route where we had Pig, Hive, 
HBase etc. as sub-projects initially and then split them into TLPs with more 
focussed communities to the benefit of everyone. Maybe you guys want to try 
this too?

----

Few more observations:
# In general, *discussions* on project directions (such as new concept of 
*maintainers*) should happen first on the public lists *before* voting, not in 
the private PMC list.
# If you chose to go this route in spite of this advice, seems to me Spark 
would be better of having more maintainers per component (at least 4-5), 
probably with a lot more diversity in terms of affiliations. Not sure if that 
is a concern - do you have good diversity in the proposed list? This will 
ensure that there are no concerns about a dominant employer controlling a 
project.

----

Hope this helps - we've gone through similar journey, got through similar 
issues and fully embraced the Apache Way (™) as Greg points out to our benefit.

thanks,
Arun


On Nov 6, 2014, at 4:18 PM, Greg Stein <gst...@gmail.com> wrote:

> -1 (non-binding)
> 
> This is an idea that runs COMPLETELY counter to the Apache Way, and is
> to be severely frowned up. This creates *unequal* ownership of the
> codebase.
> 
> Each Member of the PMC should have *equal* rights to all areas of the
> codebase until their purview. It should not be subjected to others'
> "ownership" except throught the standard mechanisms of reviews and
> if/when absolutely necessary, to vetos.
> 
> Apache does not want "leads", "benevolent dictators" or "assigned
> maintainers", no matter how you may dress it up with multiple
> maintainers per component. The fact is that this creates an unequal
> level of ownership and responsibility. The Board has shut down
> projects that attempted or allowed for "Leads". Just a few months ago,
> there was a problem with somebody calling themself a "Lead".
> 
> I don't know why you suggest that Apache Subversion does this. We
> absolutely do not. Never have. Never will. The Subversion codebase is
> owned by all of us, and we all care for every line of it. Some people
> know more than others, of course. But any one of us, can change any
> part, without being subjected to a "maintainer". Of course, we ask
> people with more knowledge of the component when we feel
> uncomfortable, but we also know when it is safe or not to make a
> specific change. And *always*, our fellow committers can review our
> work and let us know when we've done something wrong.
> 
> Equal ownership reduces fiefdoms, enhances a feeling of community and
> project ownership, and creates a more open and inviting project.
> 
> So again: -1 on this entire concept. Not good, to be polite.
> 
> Regards,
> Greg Stein
> Director, Vice Chairman
> Apache Software Foundation
> 
> On Wed, Nov 05, 2014 at 05:31:58PM -0800, Matei Zaharia wrote:
>> Hi all,
>> 
>> I wanted to share a discussion we've been having on the PMC list, as well as 
>> call for an official vote on it on a public list. Basically, as the Spark 
>> project scales up, we need to define a model to make sure there is still 
>> great oversight of key components (in particular internal architecture and 
>> public APIs), and to this end I've proposed implementing a maintainer model 
>> for some of these components, similar to other large projects.
>> 
>> As background on this, Spark has grown a lot since joining Apache. We've had 
>> over 80 contributors/month for the past 3 months, which I believe makes us 
>> the most active project in contributors/month at Apache, as well as over 500 
>> patches/month. The codebase has also grown significantly, with new libraries 
>> for SQL, ML, graphs and more.
>> 
>> In this kind of large project, one common way to scale development is to 
>> assign "maintainers" to oversee key components, where each patch to that 
>> component needs to get sign-off from at least one of its maintainers. Most 
>> existing large projects do this -- at Apache, some large ones with this 
>> model are CloudStack (the second-most active project overall), Subversion, 
>> and Kafka, and other examples include Linux and Python. This is also 
>> by-and-large how Spark operates today -- most components have a de-facto 
>> maintainer.
>> 
>> IMO, adopting this model would have two benefits:
>> 
>> 1) Consistent oversight of design for that component, especially regarding 
>> architecture and API. This process would ensure that the component's 
>> maintainers see all proposed changes and consider them to fit together in a 
>> good way.
>> 
>> 2) More structure for new contributors and committers -- in particular, it 
>> would be easy to look up who’s responsible for each module and ask them for 
>> reviews, etc, rather than having patches slip between the cracks.
>> 
>> We'd like to start with in a light-weight manner, where the model only 
>> applies to certain key components (e.g. scheduler, shuffle) and user-facing 
>> APIs (MLlib, GraphX, etc). Over time, as the project grows, we can expand it 
>> if we deem it useful. The specific mechanics would be as follows:
>> 
>> - Some components in Spark will have maintainers assigned to them, where one 
>> of the maintainers needs to sign off on each patch to the component.
>> - Each component with maintainers will have at least 2 maintainers.
>> - Maintainers will be assigned from the most active and knowledgeable 
>> committers on that component by the PMC. The PMC can vote to add / remove 
>> maintainers, and maintained components, through consensus.
>> - Maintainers are expected to be active in responding to patches for their 
>> components, though they do not need to be the main reviewers for them (e.g. 
>> they might just sign off on architecture / API). To prevent inactive 
>> maintainers from blocking the project, if a maintainer isn't responding in a 
>> reasonable time period (say 2 weeks), other committers can merge the patch, 
>> and the PMC will want to discuss adding another maintainer.
>> 
>> If you'd like to see examples for this model, check out the following 
>> projects:
>> - CloudStack: 
>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide
>>  
>> <https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Maintainers+Guide>
>>  
>> - Subversion: https://subversion.apache.org/docs/community-guide/roles.html 
>> <https://subversion.apache.org/docs/community-guide/roles.html>
>> 
>> Finally, I wanted to list our current proposal for initial components and 
>> maintainers. It would be good to get feedback on other components we might 
>> add, but please note that personnel discussions (e.g. "I don't think Matei 
>> should maintain *that* component) should only happen on the private list. 
>> The initial components were chosen to include all public APIs and the main 
>> core components, and the maintainers were chosen from the most active 
>> contributors to those modules.
>> 
>> - Spark core public API: Matei, Patrick, Reynold
>> - Job scheduler: Matei, Kay, Patrick
>> - Shuffle and network: Reynold, Aaron, Matei
>> - Block manager: Reynold, Aaron
>> - YARN: Tom, Andrew Or
>> - Python: Josh, Matei
>> - MLlib: Xiangrui, Matei
>> - SQL: Michael, Reynold
>> - Streaming: TD, Matei
>> - GraphX: Ankur, Joey, Reynold
>> 
>> I'd like to formally call a [VOTE] on this model, to last 72 hours. The 
>> [VOTE] will end on Nov 8, 2014 at 6 PM PST.
>> 
>> Matei
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
> 



-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Designating maintainers for some Spark components

Reply via email to