Thanks a lot for commenting. We are getting great feedback on this thread.
The take-aways are:

1. In general people prefer having explicit reasons why pull requests
should be closed. We should push committers to leave messages that are more
explicit about why certain PR should be closed or not. I can't agree more.
But this is not mutually exclusive.


2.  It is difficult to deal with the scale we are talking about. There is
not a single measure that could "fix" everything.

Spark is as far as I know one of the most active open source projects in
terms of contributions, in part because we have made it very easy to accept
contributions. There have been very few open source projects that needed to
deal with this scale. Actually if you look at all the historic PRs, we
closed 12k and have ~450 open. That's less than 4% of the prs outstanding
-- not a bad number. The actual ratio is likely even lower because many of
the 450 open will be merged in the future.

I also took a look at some of the most popular projects on github (e.g.
jquery, angular, react) -- they either have far fewer merged pull requests
or a higher ratio of open-to-close. So we are actually doing pretty well.
But of course there is always room for improvement.




On Mon, Apr 18, 2016 at 8:46 PM, Nicholas Chammas <
nicholas.cham...@gmail.com> wrote:

> Relevant: https://github.com/databricks/spark-pr-dashboard/issues/1
>
> A lot of this was discussed a while back when the PR Dashboard was first
> introduced, and several times before and after that as well. (e.g. August
> 2014
> <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-td8015.html>
> )
>
> If there is not enough momentum to build the tooling that people are
> discussing here, then perhaps Reynold's suggestion is the most practical
> one that is likely to see the light of day.
>
> I think asking committers to be more active in commenting on PRs is
> theoretically the correct thing to do, but impractical. I'm not a
> committer, but I would guess that most of them are already way
> overcommitted (ha!) and asking them to do more just won't yield results.
>
> We've had several instances in the past where we all tried to rally
> <https://mail-archives.apache.org/mod_mbox/spark-dev/201412.mbox/%3ccaohmdzer4cg_wxgktoxsg8s34krqezygjfzdoymgu9vhyjb...@mail.gmail.com%3E>
> and be more proactive about giving feedback, closing PRs, and nudging
> contributors who have gone silent. My observation is that the level of
> energy required to "properly" curate PR activity in that way is simply not
> sustainable. People can do it for a few weeks and then things revert to the
> way they are now.
>
> Perhaps the missing link that would make this sustainable is better
> tooling. If you think so and can sling some Javascript, you might want to
> contribute to the PR Dashboard <https://spark-prs.appspot.com/>.
>
> Perhaps the missing link is something else: A different PR review process;
> more committers; a higher barrier to contributing; a combination thereof;
> etc...
>
> Also relevant: http://danluu.com/discourage-oss/
>
> By the way, some people noted that closing PRs may discourage
> contributors. I think our open PR count alone is very discouraging. Under
> what circumstances would you feel encouraged to open a PR against a project
> that has hundreds of open PRs, some from many, many months ago
> <https://github.com/apache/spark/pulls?q=is%3Apr+is%3Aopen+sort%3Aupdated-asc>
> ?
>
> Nick
>
>
> 2016년 4월 18일 (월) 오후 10:30, Ted Yu <yuzhih...@gmail.com>님이 작성:
>
>> During the months of November / December, the 30 day period should be
>> relaxed.
>>
>> Some people(at least in US) may take extended vacation during that time.
>>
>> For Chinese developers, Spring Festival would bear similar circumstance.
>>
>> On Mon, Apr 18, 2016 at 7:25 PM, Hyukjin Kwon <gurwls...@gmail.com>
>> wrote:
>>
>>> I also think this might not have to be closed only because it is
>>> inactive.
>>>
>>>
>>> How about closing issues after 30 days when a committer's comment is
>>> added at the last without responses from the author?
>>>
>>>
>>> IMHO, If the committers are not sure whether the patch would be useful,
>>> then I think they should leave some comments why they are not sure, not
>>> just ignoring.
>>>
>>> Or, simply they could ask the author to prove that the patch is useful
>>> or safe with some references and tests.
>>>
>>>
>>> I think it might be nicer than that users are supposed to keep pinging.
>>> **Personally**, apparently, I am sometimes a bit worried if pinging
>>> multiple times can be a bit annoying.
>>>
>>>
>>>
>>> 2016-04-19 9:56 GMT+09:00 Saisai Shao <sai.sai.s...@gmail.com>:
>>>
>>>> It would be better to have a specific technical reason why this PR
>>>> should be closed, either the implementation is not good or the problem is
>>>> not valid, or something else. That will actually help the contributor to
>>>> shape their codes and reopen the PR again. Otherwise reasons like "feel
>>>> free to reopen for so-and-so reason" is actually discouraging and no
>>>> difference than directly close the PR.
>>>>
>>>> Just my two cents.
>>>>
>>>> Thanks
>>>> Jerry
>>>>
>>>>
>>>> On Tue, Apr 19, 2016 at 4:52 AM, Sean Busbey <bus...@cloudera.com>
>>>> wrote:
>>>>
>>>>> Having a PR closed, especially if due to committers not having hte
>>>>> bandwidth to check on things, will be very discouraging to new folks.
>>>>> Doubly so for those inexperienced with opensource. Even if the message
>>>>> says "feel free to reopen for so-and-so reason", new folks who lack
>>>>> confidence are going to see reopening as "pestering" and busy folks
>>>>> are going to see it as a clear indication that their work is not even
>>>>> valuable enough for a human to give a reason for closing. In either
>>>>> case, the cost of reopening is substantially higher than that button
>>>>> press.
>>>>>
>>>>> How about we start by keeping a report of "at-risk" PRs that have been
>>>>> stale for 30 days to make it easier for committers to look at the prs
>>>>> that have been long inactive?
>>>>>
>>>>>
>>>>> On Mon, Apr 18, 2016 at 2:52 PM, Reynold Xin <r...@databricks.com>
>>>>> wrote:
>>>>> > The cost of "reopen" is close to zero, because it is just clicking a
>>>>> button.
>>>>> > I think you were referring to the cost of closing the pull request,
>>>>> and you
>>>>> > are assuming people look at the pull requests that have been
>>>>> inactive for a
>>>>> > long time. That seems equally likely (or unlikely) as committers
>>>>> looking at
>>>>> > the recently closed pull requests.
>>>>> >
>>>>> > In either case, most pull requests are scanned through by us when
>>>>> they are
>>>>> > first open, and if they are important enough, usually they get merged
>>>>> > quickly or a target version is set in JIRA. We can definitely
>>>>> improve that
>>>>> > by making it more explicit.
>>>>> >
>>>>> >
>>>>> >
>>>>> > On Mon, Apr 18, 2016 at 12:46 PM, Ted Yu <yuzhih...@gmail.com>
>>>>> wrote:
>>>>> >>
>>>>> >> From committers' perspective, would they look at closed PRs ?
>>>>> >>
>>>>> >> If not, the cost is not close to zero.
>>>>> >> Meaning, some potentially useful PRs would never see the light of
>>>>> day.
>>>>> >>
>>>>> >> My two cents.
>>>>> >>
>>>>> >> On Mon, Apr 18, 2016 at 12:43 PM, Reynold Xin <r...@databricks.com>
>>>>> wrote:
>>>>> >>>
>>>>> >>> Part of it is how difficult it is to automate this. We can build a
>>>>> >>> perfect engine with a lot of rules that understand everything. But
>>>>> the more
>>>>> >>> complicated rules we need, the more unlikely for any of these to
>>>>> happen. So
>>>>> >>> I'd rather do this and create a nice enough message to tell
>>>>> contributors
>>>>> >>> sometimes mistake happen but the cost to reopen is approximately
>>>>> zero (i.e.
>>>>> >>> click a button on the pull request).
>>>>> >>>
>>>>> >>>
>>>>> >>> On Mon, Apr 18, 2016 at 12:41 PM, Ted Yu <yuzhih...@gmail.com>
>>>>> wrote:
>>>>> >>>>
>>>>> >>>> bq. close the ones where they don't respond for a week
>>>>> >>>>
>>>>> >>>> Does this imply that the script understands response from human ?
>>>>> >>>>
>>>>> >>>> Meaning, would the script use some regex which signifies that the
>>>>> >>>> contributor is willing to close the PR ?
>>>>> >>>>
>>>>> >>>> If the contributor is willing to close, why wouldn't he / she do
>>>>> it
>>>>> >>>> him/herself ?
>>>>> >>>>
>>>>> >>>> On Mon, Apr 18, 2016 at 12:33 PM, Holden Karau <
>>>>> hol...@pigscanfly.ca>
>>>>> >>>> wrote:
>>>>> >>>>>
>>>>> >>>>> Personally I'd rather err on the side of keeping PRs open, but I
>>>>> >>>>> understand wanting to keep the open PRs limited to ones which
>>>>> have a
>>>>> >>>>> reasonable chance of being merged.
>>>>> >>>>>
>>>>> >>>>> What about if we filtered for non-mergeable PRs or instead left a
>>>>> >>>>> comment asking the author to respond if they are still available
>>>>> to move the
>>>>> >>>>> PR forward - and close the ones where they don't respond for a
>>>>> week?
>>>>> >>>>>
>>>>> >>>>> Just a suggestion.
>>>>> >>>>> On Monday, April 18, 2016, Ted Yu <yuzhih...@gmail.com> wrote:
>>>>> >>>>>>
>>>>> >>>>>> I had one PR which got merged after 3 months.
>>>>> >>>>>>
>>>>> >>>>>> If the inactivity was due to contributor, I think it can be
>>>>> closed
>>>>> >>>>>> after 30 days.
>>>>> >>>>>> But if the inactivity was due to lack of review, the PR should
>>>>> be kept
>>>>> >>>>>> open.
>>>>> >>>>>>
>>>>> >>>>>> On Mon, Apr 18, 2016 at 12:17 PM, Cody Koeninger <
>>>>> c...@koeninger.org>
>>>>> >>>>>> wrote:
>>>>> >>>>>>>
>>>>> >>>>>>> For what it's worth, I have definitely had PRs that sat
>>>>> inactive for
>>>>> >>>>>>> more than 30 days due to committers not having time to look at
>>>>> them,
>>>>> >>>>>>> but did eventually end up successfully being merged.
>>>>> >>>>>>>
>>>>> >>>>>>> I guess if this just ends up being a committer ping and
>>>>> reopening the
>>>>> >>>>>>> PR, it's fine, but I don't know if it really addresses the
>>>>> underlying
>>>>> >>>>>>> issue.
>>>>> >>>>>>>
>>>>> >>>>>>> On Mon, Apr 18, 2016 at 2:02 PM, Reynold Xin <
>>>>> r...@databricks.com>
>>>>> >>>>>>> wrote:
>>>>> >>>>>>> > We have hit a new high in open pull requests: 469 today.
>>>>> While we
>>>>> >>>>>>> > can
>>>>> >>>>>>> > certainly get more review bandwidth, many of these are old
>>>>> and
>>>>> >>>>>>> > still open
>>>>> >>>>>>> > for other reasons. Some are stale because the original
>>>>> authors have
>>>>> >>>>>>> > become
>>>>> >>>>>>> > busy and inactive, and some others are stale because the
>>>>> committers
>>>>> >>>>>>> > are not
>>>>> >>>>>>> > sure whether the patch would be useful, but have not
>>>>> rejected the
>>>>> >>>>>>> > patch
>>>>> >>>>>>> > explicitly. We can cut down the signal to noise ratio by
>>>>> closing
>>>>> >>>>>>> > pull
>>>>> >>>>>>> > requests that have been inactive for greater than 30 days,
>>>>> with a
>>>>> >>>>>>> > nice
>>>>> >>>>>>> > message. I just checked and this would close ~ half of the
>>>>> pull
>>>>> >>>>>>> > requests.
>>>>> >>>>>>> >
>>>>> >>>>>>> > For example:
>>>>> >>>>>>> >
>>>>> >>>>>>> > "Thank you for creating this pull request. Since this pull
>>>>> request
>>>>> >>>>>>> > has been
>>>>> >>>>>>> > inactive for 30 days, we are automatically closing it.
>>>>> Closing the
>>>>> >>>>>>> > pull
>>>>> >>>>>>> > request does not remove it from history and will retain all
>>>>> the
>>>>> >>>>>>> > diff and
>>>>> >>>>>>> > review comments. If you have the bandwidth and would like to
>>>>> >>>>>>> > continue
>>>>> >>>>>>> > pushing this forward, please reopen it. Thanks again!"
>>>>> >>>>>>> >
>>>>> >>>>>>> >
>>>>> >>>>>>>
>>>>> >>>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> >>>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>>> >>>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>> >>>>>>>
>>>>> >>>>>>
>>>>> >>>>>
>>>>> >>>>>
>>>>> >>>>> --
>>>>> >>>>> Cell : 425-233-8271
>>>>> >>>>> Twitter: https://twitter.com/holdenkarau
>>>>> >>>>>
>>>>> >>>>
>>>>> >>>
>>>>> >>
>>>>> >
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> busbey
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>

Reply via email to