Re: Handling stale PRs

2014-12-21 Thread Nicholas Chammas
Shout-out to Michael and other Spark SQL contributors for really trimming
down the number of open/stale Spark SQL PRs
https://spark-prs.appspot.com/#sql.

As of right now, the least recently updated open Spark SQL PR goes back
only 11 days.

Nice work!

Nick


On Mon Dec 08 2014 at 2:58:08 PM Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 I recently came across this blog post, which reminded me of this thread.

 How to Discourage Open Source Contributions
 http://danluu.com/discourage-oss/

 We are currently at 320+ open PRs, many of which haven't been updated in
 over a month. We have quite a few PRs that haven't been touched in 3-5
 months.

 *If you have the time and interest, please hop on over to the Spark PR
 Dashboard https://spark-prs.appspot.com/, sort the PRs by
 least-recently-updated, and update them where you can.*

 I share the blog author's opinion that letting PRs go stale discourages
 contributions, especially from first-time contributors, and especially more
 so when the PR author is waiting on feedback from a committer or
 contributor.

 I've been thinking about simple ways to make it easier for all of us to
 chip in on controlling stale PRs in an incremental way. For starters, would
 it help if an automated email went out to the dev list once a week that a)
 reported the number of stale PRs, and b) directly linked to the 5 least
 recently updated PRs?

 Nick

 On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:

 On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 it's actually precedurally difficult for us to close pull requests


 Just an FYI: Seems like the GitHub-sanctioned work-around to having
 issues-only permissions is to have a second, issues-only repository
 https://help.github.com/articles/issues-only-access-permissions. Not a
 very attractive work-around...

 Nick




Re: Handling stale PRs

2014-12-08 Thread Nicholas Chammas
I recently came across this blog post, which reminded me of this thread.

How to Discourage Open Source Contributions
http://danluu.com/discourage-oss/

We are currently at 320+ open PRs, many of which haven't been updated in
over a month. We have quite a few PRs that haven't been touched in 3-5
months.

*If you have the time and interest, please hop on over to the Spark PR
Dashboard https://spark-prs.appspot.com/, sort the PRs by
least-recently-updated, and update them where you can.*

I share the blog author's opinion that letting PRs go stale discourages
contributions, especially from first-time contributors, and especially more
so when the PR author is waiting on feedback from a committer or
contributor.

I've been thinking about simple ways to make it easier for all of us to
chip in on controlling stale PRs in an incremental way. For starters, would
it help if an automated email went out to the dev list once a week that a)
reported the number of stale PRs, and b) directly linked to the 5 least
recently updated PRs?

Nick

On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 it's actually precedurally difficult for us to close pull requests


 Just an FYI: Seems like the GitHub-sanctioned work-around to having
 issues-only permissions is to have a second, issues-only repository
 https://help.github.com/articles/issues-only-access-permissions. Not a
 very attractive work-around...

 Nick



Re: Handling stale PRs

2014-12-08 Thread Ganelin, Ilya
Thank you for pointing this out, Nick. I know that for myself and my
colleague who are starting to contribute to Spark, it¹s definitely
discouraging to have fixes sitting in the pipeline. Could you recommend
any other ways that we can facilitate getting these PRs accepted? Clean,
well-tested code is an obvious one but I¹d like to know if there are some
non-obvious things we (as contributors) could do to make the committers¹
lives easier? Thanks!

-Ilya 

On 12/8/14, 11:58 AM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:

I recently came across this blog post, which reminded me of this thread.

How to Discourage Open Source Contributions
http://danluu.com/discourage-oss/

We are currently at 320+ open PRs, many of which haven't been updated in
over a month. We have quite a few PRs that haven't been touched in 3-5
months.

*If you have the time and interest, please hop on over to the Spark PR
Dashboard https://spark-prs.appspot.com/, sort the PRs by
least-recently-updated, and update them where you can.*

I share the blog author's opinion that letting PRs go stale discourages
contributions, especially from first-time contributors, and especially
more
so when the PR author is waiting on feedback from a committer or
contributor.

I've been thinking about simple ways to make it easier for all of us to
chip in on controlling stale PRs in an incremental way. For starters,
would
it help if an automated email went out to the dev list once a week that a)
reported the number of stale PRs, and b) directly linked to the 5 least
recently updated PRs?

Nick

On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 it's actually precedurally difficult for us to close pull requests


 Just an FYI: Seems like the GitHub-sanctioned work-around to having
 issues-only permissions is to have a second, issues-only repository
 https://help.github.com/articles/issues-only-access-permissions. Not a
 very attractive work-around...

 Nick




The information contained in this e-mail is confidential and/or proprietary to 
Capital One and/or its affiliates. The information transmitted herewith is 
intended only for use by the individual or entity to which it is addressed.  If 
the reader of this message is not the intended recipient, you are hereby 
notified that any review, retransmission, dissemination, distribution, copying 
or other use of, or taking of any action in reliance upon this information is 
strictly prohibited. If you have received this communication in error, please 
contact the sender and delete the material from your computer.


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Handling stale PRs

2014-12-08 Thread Nicholas Chammas
Things that help:

   - Be persistent. People are busy, so just ping them if there’s been no
   response for a couple of weeks. Hopefully, as the project continues to
   develop, this will become less necessary.
   - Only ping reviewers after test results are back from Jenkins. Make
   sure all the tests are clear before reaching out, unless you need help
   understanding why a test is failing.
   - Whenever possible, keep PRs small, small, small.
   - Get buy-in on the dev list before working on something, especially
   larger features, to make sure you are making something that people
   understand and that is in accordance with Spark’s design.

I’m just speaking as a random contributor here, so don’t take this advice
as gospel.

Nick
​

On Mon Dec 08 2014 at 3:08:02 PM Ganelin, Ilya ilya.gane...@capitalone.com
wrote:

 Thank you for pointing this out, Nick. I know that for myself and my
 colleague who are starting to contribute to Spark, it¹s definitely
 discouraging to have fixes sitting in the pipeline. Could you recommend
 any other ways that we can facilitate getting these PRs accepted? Clean,
 well-tested code is an obvious one but I¹d like to know if there are some
 non-obvious things we (as contributors) could do to make the committers¹
 lives easier? Thanks!

 -Ilya

 On 12/8/14, 11:58 AM, Nicholas Chammas nicholas.cham...@gmail.com
 wrote:

 I recently came across this blog post, which reminded me of this thread.
 
 How to Discourage Open Source Contributions
 http://danluu.com/discourage-oss/
 
 We are currently at 320+ open PRs, many of which haven't been updated in
 over a month. We have quite a few PRs that haven't been touched in 3-5
 months.
 
 *If you have the time and interest, please hop on over to the Spark PR
 Dashboard https://spark-prs.appspot.com/, sort the PRs by
 least-recently-updated, and update them where you can.*
 
 I share the blog author's opinion that letting PRs go stale discourages
 contributions, especially from first-time contributors, and especially
 more
 so when the PR author is waiting on feedback from a committer or
 contributor.
 
 I've been thinking about simple ways to make it easier for all of us to
 chip in on controlling stale PRs in an incremental way. For starters,
 would
 it help if an automated email went out to the dev list once a week that a)
 reported the number of stale PRs, and b) directly linked to the 5 least
 recently updated PRs?
 
 Nick
 
 On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas 
 nicholas.cham...@gmail.com wrote:
 
  On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  it's actually precedurally difficult for us to close pull requests
 
 
  Just an FYI: Seems like the GitHub-sanctioned work-around to having
  issues-only permissions is to have a second, issues-only repository
  https://help.github.com/articles/issues-only-access-permissions. Not
 a
  very attractive work-around...
 
  Nick
 

 

 The information contained in this e-mail is confidential and/or
 proprietary to Capital One and/or its affiliates. The information
 transmitted herewith is intended only for use by the individual or entity
 to which it is addressed.  If the reader of this message is not the
 intended recipient, you are hereby notified that any review,
 retransmission, dissemination, distribution, copying or other use of, or
 taking of any action in reliance upon this information is strictly
 prohibited. If you have received this communication in error, please
 contact the sender and delete the material from your computer.




Re: Handling stale PRs

2014-08-30 Thread Nicholas Chammas
On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote:

 it's actually precedurally difficult for us to close pull requests


Just an FYI: Seems like the GitHub-sanctioned work-around to having
issues-only permissions is to have a second, issues-only repository
https://help.github.com/articles/issues-only-access-permissions. Not a
very attractive work-around...

Nick


Re: Handling stale PRs

2014-08-27 Thread Nicholas Chammas
On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote:

 Last weekend, I started hacking on a Google App Engine app for helping
 with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).


BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA
issue, perhaps?

Nick


Re: Handling stale PRs

2014-08-27 Thread Nishkam Ravi
Wonder if it would make sense to introduce a notion of 'Reviewers' as an
intermediate tier to help distribute the load? While anyone can review and
comment on an open PR, reviewers would be able to say aye or nay subject to
confirmation by a committer?

Thanks,
Nishkam


On Wed, Aug 27, 2014 at 2:11 PM, Nicholas Chammas 
nicholas.cham...@gmail.com wrote:

 On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote:

  Last weekend, I started hacking on a Google App Engine app for helping
  with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).
 

 BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA
 issue, perhaps?

 Nick



Re: Handling stale PRs

2014-08-27 Thread Patrick Wendell
Hey Nishkam,

To some extent we already have this process - many community members
help review patches and some earn a reputation where committer's will
take an LGTM from them seriously. I'd be interested in seeing if any
other projects recognize people who do this.

- Patrick

On Wed, Aug 27, 2014 at 2:36 PM, Nishkam Ravi nr...@cloudera.com wrote:
 Wonder if it would make sense to introduce a notion of 'Reviewers' as an
 intermediate tier to help distribute the load? While anyone can review and
 comment on an open PR, reviewers would be able to say aye or nay subject to
 confirmation by a committer?

 Thanks,
 Nishkam


 On Wed, Aug 27, 2014 at 2:11 PM, Nicholas Chammas
 nicholas.cham...@gmail.com wrote:

 On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote:

  Last weekend, I started hacking on a Google App Engine app for helping
  with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).
 

 BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA
 issue, perhaps?

 Nick



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Handling stale PRs

2014-08-27 Thread Josh Rosen
I have a very simple dashboard running at http://spark-prs.appspot.com/.  
Currently, this mirrors the functionality of Patrick’s github-shim, but it 
should be very easy to extend with other features.

The source is at https://github.com/databricks/spark-pr-dashboard (pull 
requests and issues welcome!)

On August 27, 2014 at 2:11:41 PM, Nicholas Chammas (nicholas.cham...@gmail.com) 
wrote:

On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote:
Last weekend, I started hacking on a Google App Engine app for helping with 
pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).

BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue, 
perhaps?

Nick

Re: Handling stale PRs

2014-08-27 Thread Nicholas Chammas
Alright! That was quick. :)


On Wed, Aug 27, 2014 at 6:48 PM, Josh Rosen rosenvi...@gmail.com wrote:

 I have a very simple dashboard running at http://spark-prs.appspot.com/.
  Currently, this mirrors the functionality of Patrick’s github-shim, but it
 should be very easy to extend with other features.

 The source is at https://github.com/databricks/spark-pr-dashboard (pull
 requests and issues welcome!)

 On August 27, 2014 at 2:11:41 PM, Nicholas Chammas (
 nicholas.cham...@gmail.com) wrote:

  On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote:

  Last weekend, I started hacking on a Google App Engine app for helping
 with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).


 BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA
 issue, perhaps?

 Nick




Re: Handling stale PRs

2014-08-27 Thread Nishkam Ravi
I see. Yeah, it would be interesting to know if any other project has
considered formalizing this notion. It may also enable assignment of
reviews (potentially automated using Josh's system) and maybe anonymity as
well? On the downside, it isn't easily implemented and probably doesn't
come without certain undesired side-effects.

Thanks,
Nishkam


On Wed, Aug 27, 2014 at 3:39 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Nishkam,

 To some extent we already have this process - many community members
 help review patches and some earn a reputation where committer's will
 take an LGTM from them seriously. I'd be interested in seeing if any
 other projects recognize people who do this.

 - Patrick

 On Wed, Aug 27, 2014 at 2:36 PM, Nishkam Ravi nr...@cloudera.com wrote:
  Wonder if it would make sense to introduce a notion of 'Reviewers' as an
  intermediate tier to help distribute the load? While anyone can review
 and
  comment on an open PR, reviewers would be able to say aye or nay subject
 to
  confirmation by a committer?
 
  Thanks,
  Nishkam
 
 
  On Wed, Aug 27, 2014 at 2:11 PM, Nicholas Chammas
  nicholas.cham...@gmail.com wrote:
 
  On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com
 wrote:
 
   Last weekend, I started hacking on a Google App Engine app for helping
   with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png
 ).
  
 
  BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA
  issue, perhaps?
 
  Nick
 
 



Re: Handling stale PRs

2014-08-26 Thread Patrick Wendell
Hey Nicholas,

Thanks for bringing this up. There are a few dimensions to this... one is
that it's actually precedurally difficult for us to close pull requests.
I've proposed several different solutions to ASF infra to streamline the
process, but thus far they haven't been open to any of my ideas:

https://issues.apache.org/jira/browse/INFRA-7918
https://issues.apache.org/jira/browse/INFRA-8241

The more important thing, maybe, is how we want to deal with this
culturally. And I think we need to do a better job of making sure no pull
requests go unattended (i.e. waiting for committer feedback). If patches go
stale, it should be because the user hasn't responded, not us.

Another thing is that we should, IMO, err on the side of explicitly saying
no or not yet to patches, rather than letting them linger without
attention. We do get patches where the user is well intentioned, but it is
a feature that doesn't make sense to add, or isn't well thought out or
explained, or the review effort would be so large it's not within our
capacity to look at just yet.

Most other ASF projects I know just ignore these patches. I'd prefer if we
took the approach of politely explaining why in the current form the patch
isn't acceptable and closing it (potentially w/ tips on how to improve it
or narrow the scope).

- Patrick




On Mon, Aug 25, 2014 at 9:57 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 Hey Nicholas,

 In general we've been looking at these periodically (at least I have) and
 asking people to close out of date ones, but it's true that the list has
 gotten fairly large. We should probably have an expiry time of a few months
 and close them automatically. I agree that it's daunting to see so many
 open PRs.

 Matei

 On August 25, 2014 at 9:03:09 PM, Nicholas Chammas (
 nicholas.cham...@gmail.com) wrote:

 Check this out:

 https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc

 We're hitting close to 300 open PRs. Those are the least recently updated
 ones.

 I think having a low number of stale (i.e. not recently updated) PRs is a
 good thing to shoot for. It doesn't leave contributors hanging (which feels
 bad for contributors), and reduces project clutter (which feels bad for
 maintainers/committers).

 What is our approach to tackling this problem?

 I think communicating and enforcing a clear policy on how stale PRs are
 handled might be a good way to reduce the number of stale PRs we have
 without making contributors feel rejected.

 I don't know what such a policy would look like, but it should be
 enforceable and lightweight--i.e. it shouldn't feel like a hammer used to
 reject people's work, but rather a necessary tool to keep the project's
 contributions relevant and manageable.

 Nick



Re: Handling stale PRs

2014-08-26 Thread Matthew Farrellee

On 08/26/2014 04:57 AM, Sean Owen wrote:

On Tue, Aug 26, 2014 at 7:02 AM, Patrick Wendell pwend...@gmail.com wrote:

Most other ASF projects I know just ignore these patches. I'd prefer if we


Agree, this drives me crazy. It kills part of JIRA's usefulness. Spark
is blessed/cursed with incredible inbound load, but would love to
still see the project get this right-er than, say, Hadoop.


totally agree, this applies to patches as well as jiras. i'll add that 
projects who let things simply linger are missing an opportunity to 
engage their community.


spark should capitalize on its momentum to build a smoothly running 
community (vs not and accept an unbounded backlog as inevitable).




The more important thing, maybe, is how we want to deal with this
culturally. And I think we need to do a better job of making sure no pull
requests go unattended (i.e. waiting for committer feedback). If patches go
stale, it should be because the user hasn't responded, not us.


Stale JIRAs are a symptom, not a problem per se. I also want to see
the backlog cleared, but automatically closing doesn't help, if the
problem is too many JIRAs and not enough committer-hours to look at
them. Some noise gets closed, but some easy or important fixes may
disappear as well.


engagement in the community really needs to go both ways. it's 
reasonable for PRs that stop merging or have open comments that need 
resolution by the PRer to be loudly timed out. a similar thing goes for 
jiras, if there's a request for more information to resolve a bug and 
that information does not appear, half of the communication is gone and 
a loud time out is reasonable.


easy and important are in the eyes of the beholder. timeouts can go both 
ways. a jira or pr that has been around for a period of time (say 1/3 
the to-close timeout) should bump up for evaluation, hopefully resulting 
in few easy or important issues falling through the cracks.


fyi, i'm periodically going through the pyspark jiras, trying to 
reproduce issues, coalesce duplicates and ask for details. i've not been 
given any sort of permission to do this, i don't have any special 
position in the community to do this - in a well functioning community 
everyone should feel free to jump in and help.




Another thing is that we should, IMO, err on the side of explicitly saying
no or not yet to patches, rather than letting them linger without
attention. We do get patches where the user is well intentioned, but it is


Completely agree. The solution is partly more supply of committer time
on JIRAs. But that is detracting from the work the committers
themselves want to do. More of the solution is reducing demand by
helping people create useful, actionable, non-duplicate JIRAs from the
start. Or encouraging people to resolve existing JIRAs and shepherding
those in.


saying no/not-yet is a vitally important piece of information.



Elsewhere, I've found people reluctant to close JIRA for fear of
offending or turning off contributors. I think the opposite is true.
There is nothing wrong with no or not now especially accompanied
with constructive feedback. Better to state for the record what is not
being looked at and why, than let people work on and open the same
JIRAs repeatedly.


well stated!



I have also found in the past that a culture of tolerating eternal
JIRAs led people to file JIRAs in order to forget about a problem --
it's in JIRA, and it's in progress, so it feels like someone else is
going to fix it later and so it can be forgotten now.


there's some value in these now-i-can-forget jira, though i'm not 
personally a fan. it can be good to keep them around and reachable by 
search, but they should be clearly marked as no/not-yet or something 
similar.




For what it's worth, I think these project and culture mechanics are
so important and it's my #1 concern for Spark at this stage. This
challenge exists so much more here, exactly because there is so much
potential. I'd love to help by trying to identify and close stale
JIRAs but am afraid that tagging them is just adding to the heap of
work.


+1 concern and potential!


best,


matt

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Handling stale PRs

2014-08-26 Thread Madhu
Sean Owen wrote
 Stale JIRAs are a symptom, not a problem per se. I also want to see
 the backlog cleared, but automatically closing doesn't help, if the
 problem is too many JIRAs and not enough committer-hours to look at
 them. Some noise gets closed, but some easy or important fixes may
 disappear as well.

Agreed. All of the problems mentioned in this thread are symptoms. There's
no shortage of talent and enthusiasm within the Spark community. The people
and the product are wonderful. The process: not so much. Spark has been
wildly successful, some growing pains are to be expected.

Given 100+ contributors, Spark is a big project. As with big data, big
projects can run into scaling issues. There's no magic to running a
successful big project, but it does require greater planning and discipline.
JIRA is great for issue tracking, but it's not a replacement for a project
plan. Quarterly releases are a great idea, everyone knows the schedule. What
we need is concise plan for each release with a clear scope statement.
Without knowing what is in scope and out of scope for a release, we end up
with a laundry list of things to do, but no clear goal. Laundry lists don't
scale well.

I don't mind helping with planning and documenting releases. This is
especially helpful for new contributors who don't know where to start. I
have done that successfully on many projects using Jira and Confluence, so I
know it can be done. To address immediate concerns of open PRs and
excessive, overlapping Jira issues, we probably have to create a meta issue
and assign resources to fix it. I don't mind helping with that also.



-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-tp8015p8031.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Handling stale PRs

2014-08-26 Thread Erik Erlandson


- Original Message -

  Another thing is that we should, IMO, err on the side of explicitly saying
  no or not yet to patches, rather than letting them linger without
  attention. We do get patches where the user is well intentioned, but it is
 
  Completely agree. The solution is partly more supply of committer time
  on JIRAs. But that is detracting from the work the committers
  themselves want to do. More of the solution is reducing demand by
  helping people create useful, actionable, non-duplicate JIRAs from the
  start. Or encouraging people to resolve existing JIRAs and shepherding
  those in.
 
 saying no/not-yet is a vitally important piece of information.

+1, when I propose a contribution to a project, I consider an articulate (and 
hopefully polite) no thanks, or not-yet, or needs-work, to be far more 
useful and pleasing than just radio silence.  It allows me to either address 
feedback, or just move on.

Although it takes effort to keep abreast of community contributions, I don't 
think it needs to be an overbearing or heavy-weight process.  I've seen other 
communities where they talked themselves out of better management because they 
conceived the ticket workflow as being more effort than it needed to be.  Much 
better to keep ticket triage and workflow fast/simple, but actually do it.



 
 
  Elsewhere, I've found people reluctant to close JIRA for fear of
  offending or turning off contributors. I think the opposite is true.
  There is nothing wrong with no or not now especially accompanied
  with constructive feedback. Better to state for the record what is not
  being looked at and why, than let people work on and open the same
  JIRAs repeatedly.
 
 well stated!
 
 
  I have also found in the past that a culture of tolerating eternal
  JIRAs led people to file JIRAs in order to forget about a problem --
  it's in JIRA, and it's in progress, so it feels like someone else is
  going to fix it later and so it can be forgotten now.
 
 there's some value in these now-i-can-forget jira, though i'm not
 personally a fan. it can be good to keep them around and reachable by
 search, but they should be clearly marked as no/not-yet or something
 similar.
 
 
  For what it's worth, I think these project and culture mechanics are
  so important and it's my #1 concern for Spark at this stage. This
  challenge exists so much more here, exactly because there is so much
  potential. I'd love to help by trying to identify and close stale
  JIRAs but am afraid that tagging them is just adding to the heap of
  work.
 
 +1 concern and potential!
 
 
 best,
 
 
 matt
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org
 
 

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas
On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote:

 I'd prefer if we took the approach of politely explaining why in the
 current form the patch isn't acceptable and closing it (potentially w/ tips
 on how to improve it or narrow the scope).


Amen to this. Aiming for such a culture would set Spark apart from other
projects in a great way.

I've proposed several different solutions to ASF infra to streamline the
 process, but thus far they haven't been open to any of my ideas:


I've added myself as a watcher on those 2 INFRA issues. Sucks that the only
solution on offer right now requires basically polluting the commit history.

Short of moving Spark's repo to a non-ASF-managed GitHub account, do you
think another bot could help us manage the number of stale PRs?

I'm thinking a solution as follows might be very helpful:

   - Extend Spark QA / Jenkins to run on a weekly schedule and check for
   stale PRs. Let's say a stale PR is an open one that hasn't been updated in
   N months.
   - Spark QA maintains a list of known committers on its side.
   - During its weekly check of stale PRs, Spark QA takes the following
   action:
  - If the last person to comment on a PR was a committer, post to the
  PR asking for an update from the contributor.
  - If the last person to comment on a PR was a contributor, add the PR
  to a list. Email this list of *hanging PRs* out to the dev list on a
  weekly basis and ask committers to update them.
  - If the last person to comment on a PR was Spark QA asking the
  contributor to update it, then add the PR to a list. Email this
list of *abandoned
  PRs* to the dev list for the record (or for closing, if that becomes
  possible in the future).

This doesn't solve the problem of not being able to close PRs, but it does
help make sure no PR is left hanging for long.

What do you think? I'd be interested in implementing this solution if we
like it.

Nick


Re: Handling stale PRs

2014-08-26 Thread Josh Rosen
Last weekend, I started hacking on a Google App Engine app for helping with 
pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).  Some of my 
basic goals (not all implemented yet):

- Users sign in using GitHub and can browse a list of pull requests, including 
links to associated JIRAs, Jenkins statuses, a quick preview of the last 
comment, etc.

- Pull requests are auto-classified based on which components they modify (by 
looking at the diff).

- From the app’s own internal database of PRs, we can build dashboards to find 
“abandoned” PRs, graph average time to first review, etc.

- Since we authenticate users with GitHub, we can enable administrative 
functions via this dashboard (e.g. “assign this PR to me”, “vote to close in 
the weekly auto-close commit”, etc.

Right now, I’ve implemented GItHub OAuth support and code to update the issues 
database using the GitHub API.  Because we have access to the full API, it’s 
pretty easy to do fancy things like parsing the reason for Jenkins failure, 
etc.  You could even imagine some fancy mashup tools to pull up JIRAs and pull 
requests side-by in iframes.

After I hack on this a bit more, I plan to release a public preview version; if 
we find this tool useful, I’ll clean it up and open-source the app so folks can 
contribute to it.

- Josh

On August 26, 2014 at 8:16:46 AM, Nicholas Chammas (nicholas.cham...@gmail.com) 
wrote:

On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote:  

 I'd prefer if we took the approach of politely explaining why in the  
 current form the patch isn't acceptable and closing it (potentially w/ tips  
 on how to improve it or narrow the scope).  


Amen to this. Aiming for such a culture would set Spark apart from other  
projects in a great way.  

I've proposed several different solutions to ASF infra to streamline the  
 process, but thus far they haven't been open to any of my ideas:  


I've added myself as a watcher on those 2 INFRA issues. Sucks that the only  
solution on offer right now requires basically polluting the commit history.  

Short of moving Spark's repo to a non-ASF-managed GitHub account, do you  
think another bot could help us manage the number of stale PRs?  

I'm thinking a solution as follows might be very helpful:  

- Extend Spark QA / Jenkins to run on a weekly schedule and check for  
stale PRs. Let's say a stale PR is an open one that hasn't been updated in  
N months.  
- Spark QA maintains a list of known committers on its side.  
- During its weekly check of stale PRs, Spark QA takes the following  
action:  
- If the last person to comment on a PR was a committer, post to the  
PR asking for an update from the contributor.  
- If the last person to comment on a PR was a contributor, add the PR  
to a list. Email this list of *hanging PRs* out to the dev list on a  
weekly basis and ask committers to update them.  
- If the last person to comment on a PR was Spark QA asking the  
contributor to update it, then add the PR to a list. Email this  
list of *abandoned  
PRs* to the dev list for the record (or for closing, if that becomes  
possible in the future).  

This doesn't solve the problem of not being able to close PRs, but it does  
help make sure no PR is left hanging for long.  

What do you think? I'd be interested in implementing this solution if we  
like it.  

Nick  


Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas
OK, that sounds pretty cool.

Josh,

Do you see this app as encompassing or supplanting the functionality I
described as well?

Nick


On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote:

 Last weekend, I started hacking on a Google App Engine app for helping
 with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).
  Some of my basic goals (not all implemented yet):

 - Users sign in using GitHub and can browse a list of pull requests,
 including links to associated JIRAs, Jenkins statuses, a quick preview of
 the last comment, etc.

 - Pull requests are auto-classified based on which components they modify
 (by looking at the diff).

 - From the app’s own internal database of PRs, we can build dashboards to
 find “abandoned” PRs, graph average time to first review, etc.

 - Since we authenticate users with GitHub, we can enable administrative
 functions via this dashboard (e.g. “assign this PR to me”, “vote to close
 in the weekly auto-close commit”, etc.

 Right now, I’ve implemented GItHub OAuth support and code to update the
 issues database using the GitHub API.  Because we have access to the full
 API, it’s pretty easy to do fancy things like parsing the reason for
 Jenkins failure, etc.  You could even imagine some fancy mashup tools to
 pull up JIRAs and pull requests side-by in iframes.

 After I hack on this a bit more, I plan to release a public preview
 version; if we find this tool useful, I’ll clean it up and open-source the
 app so folks can contribute to it.

 - Josh

 On August 26, 2014 at 8:16:46 AM, Nicholas Chammas (
 nicholas.cham...@gmail.com) wrote:

 On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com
 wrote:

  I'd prefer if we took the approach of politely explaining why in the
  current form the patch isn't acceptable and closing it (potentially w/
 tips
  on how to improve it or narrow the scope).


 Amen to this. Aiming for such a culture would set Spark apart from other
 projects in a great way.

 I've proposed several different solutions to ASF infra to streamline the
  process, but thus far they haven't been open to any of my ideas:


 I've added myself as a watcher on those 2 INFRA issues. Sucks that the
 only
 solution on offer right now requires basically polluting the commit
 history.

 Short of moving Spark's repo to a non-ASF-managed GitHub account, do you
 think another bot could help us manage the number of stale PRs?

 I'm thinking a solution as follows might be very helpful:

 - Extend Spark QA / Jenkins to run on a weekly schedule and check for
 stale PRs. Let's say a stale PR is an open one that hasn't been updated in
 N months.
 - Spark QA maintains a list of known committers on its side.
 - During its weekly check of stale PRs, Spark QA takes the following
 action:
 - If the last person to comment on a PR was a committer, post to the
 PR asking for an update from the contributor.
 - If the last person to comment on a PR was a contributor, add the PR
 to a list. Email this list of *hanging PRs* out to the dev list on a
 weekly basis and ask committers to update them.
 - If the last person to comment on a PR was Spark QA asking the
 contributor to update it, then add the PR to a list. Email this
 list of *abandoned
 PRs* to the dev list for the record (or for closing, if that becomes
 possible in the future).

 This doesn't solve the problem of not being able to close PRs, but it does
 help make sure no PR is left hanging for long.

 What do you think? I'd be interested in implementing this solution if we
 like it.

 Nick




Re: Handling stale PRs

2014-08-26 Thread Josh Rosen
Sure; App Engine supports cron and sending emails.  We can configure the app 
with Spark QA’s credentials in order to allow it to post comments on issues, 
etc.

- Josh

On August 26, 2014 at 11:38:08 AM, Nicholas Chammas 
(nicholas.cham...@gmail.com) wrote:

OK, that sounds pretty cool.

Josh,

Do you see this app as encompassing or supplanting the functionality I 
described as well?

Nick


On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote:
Last weekend, I started hacking on a Google App Engine app for helping with 
pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).  Some of my 
basic goals (not all implemented yet):

- Users sign in using GitHub and can browse a list of pull requests, including 
links to associated JIRAs, Jenkins statuses, a quick preview of the last 
comment, etc.

- Pull requests are auto-classified based on which components they modify (by 
looking at the diff).

- From the app’s own internal database of PRs, we can build dashboards to find 
“abandoned” PRs, graph average time to first review, etc.

- Since we authenticate users with GitHub, we can enable administrative 
functions via this dashboard (e.g. “assign this PR to me”, “vote to close in 
the weekly auto-close commit”, etc.

Right now, I’ve implemented GItHub OAuth support and code to update the issues 
database using the GitHub API.  Because we have access to the full API, it’s 
pretty easy to do fancy things like parsing the reason for Jenkins failure, 
etc.  You could even imagine some fancy mashup tools to pull up JIRAs and pull 
requests side-by in iframes.

After I hack on this a bit more, I plan to release a public preview version; if 
we find this tool useful, I’ll clean it up and open-source the app so folks can 
contribute to it.

- Josh

On August 26, 2014 at 8:16:46 AM, Nicholas Chammas (nicholas.cham...@gmail.com) 
wrote:

On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote:

 I'd prefer if we took the approach of politely explaining why in the
 current form the patch isn't acceptable and closing it (potentially w/ tips
 on how to improve it or narrow the scope).


Amen to this. Aiming for such a culture would set Spark apart from other
projects in a great way.

I've proposed several different solutions to ASF infra to streamline the
 process, but thus far they haven't been open to any of my ideas:


I've added myself as a watcher on those 2 INFRA issues. Sucks that the only
solution on offer right now requires basically polluting the commit history.

Short of moving Spark's repo to a non-ASF-managed GitHub account, do you
think another bot could help us manage the number of stale PRs?

I'm thinking a solution as follows might be very helpful:

- Extend Spark QA / Jenkins to run on a weekly schedule and check for
stale PRs. Let's say a stale PR is an open one that hasn't been updated in
N months.
- Spark QA maintains a list of known committers on its side.
- During its weekly check of stale PRs, Spark QA takes the following
action:
- If the last person to comment on a PR was a committer, post to the
PR asking for an update from the contributor.
- If the last person to comment on a PR was a contributor, add the PR
to a list. Email this list of *hanging PRs* out to the dev list on a
weekly basis and ask committers to update them.
- If the last person to comment on a PR was Spark QA asking the
contributor to update it, then add the PR to a list. Email this
list of *abandoned
PRs* to the dev list for the record (or for closing, if that becomes
possible in the future).

This doesn't solve the problem of not being able to close PRs, but it does
help make sure no PR is left hanging for long.

What do you think? I'd be interested in implementing this solution if we
like it.

Nick



Re: Handling stale PRs

2014-08-26 Thread Nicholas Chammas
By the way, as a reference point, I just stumbled across the Discourse
GitHub project and their list of pull requests
https://github.com/discourse/discourse/pulls looks pretty neat.

~2,200 closed PRs, 6 open. Least recently updated PR dates to 8 days ago.
Project started ~1.5 years ago.

Dunno how many committers Discourse has, but it looks like they've managed
their PRs well. I hope we can do as well in this regard as they have.

Nick


On Tue, Aug 26, 2014 at 2:40 PM, Josh Rosen rosenvi...@gmail.com wrote:

 Sure; App Engine supports cron and sending emails.  We can configure the
 app with Spark QA’s credentials in order to allow it to post comments on
 issues, etc.

 - Josh

 On August 26, 2014 at 11:38:08 AM, Nicholas Chammas (
 nicholas.cham...@gmail.com) wrote:

  OK, that sounds pretty cool.

 Josh,

 Do you see this app as encompassing or supplanting the functionality I
 described as well?

 Nick


 On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote:

  Last weekend, I started hacking on a Google App Engine app for helping
 with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png).
  Some of my basic goals (not all implemented yet):

  - Users sign in using GitHub and can browse a list of pull requests,
 including links to associated JIRAs, Jenkins statuses, a quick preview of
 the last comment, etc.

  - Pull requests are auto-classified based on which components they
 modify (by looking at the diff).

  - From the app’s own internal database of PRs, we can build dashboards
 to find “abandoned” PRs, graph average time to first review, etc.

  - Since we authenticate users with GitHub, we can enable administrative
 functions via this dashboard (e.g. “assign this PR to me”, “vote to close
 in the weekly auto-close commit”, etc.

  Right now, I’ve implemented GItHub OAuth support and code to update the
 issues database using the GitHub API.  Because we have access to the full
 API, it’s pretty easy to do fancy things like parsing the reason for
 Jenkins failure, etc.  You could even imagine some fancy mashup tools to
 pull up JIRAs and pull requests side-by in iframes.

 After I hack on this a bit more, I plan to release a public preview
 version; if we find this tool useful, I’ll clean it up and open-source the
 app so folks can contribute to it.

 - Josh

 On August 26, 2014 at 8:16:46 AM, Nicholas Chammas (
 nicholas.cham...@gmail.com) wrote:

  On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com
 wrote:

  I'd prefer if we took the approach of politely explaining why in the
  current form the patch isn't acceptable and closing it (potentially w/
 tips
  on how to improve it or narrow the scope).


 Amen to this. Aiming for such a culture would set Spark apart from other
 projects in a great way.

 I've proposed several different solutions to ASF infra to streamline the
  process, but thus far they haven't been open to any of my ideas:


 I've added myself as a watcher on those 2 INFRA issues. Sucks that the
 only
 solution on offer right now requires basically polluting the commit
 history.

 Short of moving Spark's repo to a non-ASF-managed GitHub account, do you
 think another bot could help us manage the number of stale PRs?

 I'm thinking a solution as follows might be very helpful:

 - Extend Spark QA / Jenkins to run on a weekly schedule and check for
 stale PRs. Let's say a stale PR is an open one that hasn't been updated in
 N months.
 - Spark QA maintains a list of known committers on its side.
 - During its weekly check of stale PRs, Spark QA takes the following
 action:
 - If the last person to comment on a PR was a committer, post to the
 PR asking for an update from the contributor.
 - If the last person to comment on a PR was a contributor, add the PR
 to a list. Email this list of *hanging PRs* out to the dev list on a
 weekly basis and ask committers to update them.
 - If the last person to comment on a PR was Spark QA asking the
 contributor to update it, then add the PR to a list. Email this
 list of *abandoned
 PRs* to the dev list for the record (or for closing, if that becomes
 possible in the future).

 This doesn't solve the problem of not being able to close PRs, but it does
 help make sure no PR is left hanging for long.

 What do you think? I'd be interested in implementing this solution if we
 like it.

 Nick





Re: Handling stale PRs

2014-08-26 Thread Madhu
Nicholas Chammas wrote
 Dunno how many committers Discourse has, but it looks like they've managed
 their PRs well. I hope we can do as well in this regard as they have.

Discourse developers appear to  eat their own dog food
https://meta.discourse.org  .
Improved collaboration and a shared vision might be a reason for their
success.




-
--
Madhu
https://www.linkedin.com/in/msiddalingaiah
--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-tp8015p8061.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Handling stale PRs

2014-08-25 Thread Nicholas Chammas
Check this out:
https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc

We're hitting close to 300 open PRs. Those are the least recently updated
ones.

I think having a low number of stale (i.e. not recently updated) PRs is a
good thing to shoot for. It doesn't leave contributors hanging (which feels
bad for contributors), and reduces project clutter (which feels bad for
maintainers/committers).

What is our approach to tackling this problem?

I think communicating and enforcing a clear policy on how stale PRs are
handled might be a good way to reduce the number of stale PRs we have
without making contributors feel rejected.

I don't know what such a policy would look like, but it should be
enforceable and lightweight--i.e. it shouldn't feel like a hammer used to
reject people's work, but rather a necessary tool to keep the project's
contributions relevant and manageable.

Nick


Re: Handling stale PRs

2014-08-25 Thread Matei Zaharia
Hey Nicholas,

In general we've been looking at these periodically (at least I have) and 
asking people to close out of date ones, but it's true that the list has gotten 
fairly large. We should probably have an expiry time of a few months and close 
them automatically. I agree that it's daunting to see so many open PRs.

Matei

On August 25, 2014 at 9:03:09 PM, Nicholas Chammas (nicholas.cham...@gmail.com) 
wrote:

Check this out: 
https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc 

We're hitting close to 300 open PRs. Those are the least recently updated 
ones. 

I think having a low number of stale (i.e. not recently updated) PRs is a 
good thing to shoot for. It doesn't leave contributors hanging (which feels 
bad for contributors), and reduces project clutter (which feels bad for 
maintainers/committers). 

What is our approach to tackling this problem? 

I think communicating and enforcing a clear policy on how stale PRs are 
handled might be a good way to reduce the number of stale PRs we have 
without making contributors feel rejected. 

I don't know what such a policy would look like, but it should be 
enforceable and lightweight--i.e. it shouldn't feel like a hammer used to 
reject people's work, but rather a necessary tool to keep the project's 
contributions relevant and manageable. 

Nick