Several people asked about having maintainers review the PR queue for their 
modules regularly, and I like that idea. We have a new tool now to help with 
that in https://spark-prs.appspot.com.

In terms of the set of open PRs itself, it is large but note that there are 
also 2800 *closed* PRs, which means we close the majority of PRs (and I don't 
know the exact stats but I'd guess that 90% of those are accepted and merged). 
I think one problem is that with GitHub, people often develop something as a PR 
and have a lot of discussion on there (including whether we even want the 
feature). I recently updated our "how to contribute" page to encourage opening 
a JIRA and having discussions on the dev list first, but I do think we need to 
be faster with closing ones that we don't have a plan to merge. Note that 
Hadoop, Hive, HBase, etc also have about 300 issues each in the "patch 
available" state, so this is some kind of universal constant :P.

Matei


> On Nov 5, 2014, at 10:46 PM, Sean Owen <so...@cloudera.com> wrote:
> 
> Naturally, this sounds great. FWIW my only but significant worry about
> Spark is scaling up to meet unprecedented demand in the form of
> questions and contribution. Clarifying responsibility and ownership
> helps more than it hurts by adding process.
> 
> This is related but different topic, but, I wonder out loud what this
> can do to help clear the backlog -- ~*1200* open JIRAs and ~300 open
> PRs, most of which have de facto already fallen between some cracks.
> This harms the usefulness of these tools and processes.
> 
> I'd love to see this translate into triage / closing of most of it by
> maintainers, and new actions and strategies for increasing
> 'throughput' in review and/or helping people make better contributions
> in the first place.
> 
> On Thu, Nov 6, 2014 at 1:31 AM, Matei Zaharia <matei.zaha...@gmail.com> wrote:
>> Hi all,
>> 
>> I wanted to share a discussion we've been having on the PMC list, as well as 
>> call for an official vote on it on a public list. Basically, as the Spark 
>> project scales up, we need to define a model to make sure there is still 
>> great oversight of key components (in particular internal architecture and 
>> public APIs), and to this end I've proposed implementing a maintainer model 
>> for some of these components, similar to other large projects.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org

Reply via email to