Re: Handling stale PRs
Shout-out to Michael and other Spark SQL contributors for really trimming down the number of open/stale Spark SQL PRs https://spark-prs.appspot.com/#sql. As of right now, the least recently updated open Spark SQL PR goes back only 11 days. Nice work! Nick On Mon Dec 08 2014 at 2:58:08 PM Nicholas Chammas nicholas.cham...@gmail.com wrote: I recently came across this blog post, which reminded me of this thread. How to Discourage Open Source Contributions http://danluu.com/discourage-oss/ We are currently at 320+ open PRs, many of which haven't been updated in over a month. We have quite a few PRs that haven't been touched in 3-5 months. *If you have the time and interest, please hop on over to the Spark PR Dashboard https://spark-prs.appspot.com/, sort the PRs by least-recently-updated, and update them where you can.* I share the blog author's opinion that letting PRs go stale discourages contributions, especially from first-time contributors, and especially more so when the PR author is waiting on feedback from a committer or contributor. I've been thinking about simple ways to make it easier for all of us to chip in on controlling stale PRs in an incremental way. For starters, would it help if an automated email went out to the dev list once a week that a) reported the number of stale PRs, and b) directly linked to the 5 least recently updated PRs? Nick On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: it's actually precedurally difficult for us to close pull requests Just an FYI: Seems like the GitHub-sanctioned work-around to having issues-only permissions is to have a second, issues-only repository https://help.github.com/articles/issues-only-access-permissions. Not a very attractive work-around... Nick
Re: Handling stale PRs
I recently came across this blog post, which reminded me of this thread. How to Discourage Open Source Contributions http://danluu.com/discourage-oss/ We are currently at 320+ open PRs, many of which haven't been updated in over a month. We have quite a few PRs that haven't been touched in 3-5 months. *If you have the time and interest, please hop on over to the Spark PR Dashboard https://spark-prs.appspot.com/, sort the PRs by least-recently-updated, and update them where you can.* I share the blog author's opinion that letting PRs go stale discourages contributions, especially from first-time contributors, and especially more so when the PR author is waiting on feedback from a committer or contributor. I've been thinking about simple ways to make it easier for all of us to chip in on controlling stale PRs in an incremental way. For starters, would it help if an automated email went out to the dev list once a week that a) reported the number of stale PRs, and b) directly linked to the 5 least recently updated PRs? Nick On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: it's actually precedurally difficult for us to close pull requests Just an FYI: Seems like the GitHub-sanctioned work-around to having issues-only permissions is to have a second, issues-only repository https://help.github.com/articles/issues-only-access-permissions. Not a very attractive work-around... Nick
Re: Handling stale PRs
Thank you for pointing this out, Nick. I know that for myself and my colleague who are starting to contribute to Spark, it¹s definitely discouraging to have fixes sitting in the pipeline. Could you recommend any other ways that we can facilitate getting these PRs accepted? Clean, well-tested code is an obvious one but I¹d like to know if there are some non-obvious things we (as contributors) could do to make the committers¹ lives easier? Thanks! -Ilya On 12/8/14, 11:58 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I recently came across this blog post, which reminded me of this thread. How to Discourage Open Source Contributions http://danluu.com/discourage-oss/ We are currently at 320+ open PRs, many of which haven't been updated in over a month. We have quite a few PRs that haven't been touched in 3-5 months. *If you have the time and interest, please hop on over to the Spark PR Dashboard https://spark-prs.appspot.com/, sort the PRs by least-recently-updated, and update them where you can.* I share the blog author's opinion that letting PRs go stale discourages contributions, especially from first-time contributors, and especially more so when the PR author is waiting on feedback from a committer or contributor. I've been thinking about simple ways to make it easier for all of us to chip in on controlling stale PRs in an incremental way. For starters, would it help if an automated email went out to the dev list once a week that a) reported the number of stale PRs, and b) directly linked to the 5 least recently updated PRs? Nick On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: it's actually precedurally difficult for us to close pull requests Just an FYI: Seems like the GitHub-sanctioned work-around to having issues-only permissions is to have a second, issues-only repository https://help.github.com/articles/issues-only-access-permissions. Not a very attractive work-around... Nick The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Handling stale PRs
Things that help: - Be persistent. People are busy, so just ping them if there’s been no response for a couple of weeks. Hopefully, as the project continues to develop, this will become less necessary. - Only ping reviewers after test results are back from Jenkins. Make sure all the tests are clear before reaching out, unless you need help understanding why a test is failing. - Whenever possible, keep PRs small, small, small. - Get buy-in on the dev list before working on something, especially larger features, to make sure you are making something that people understand and that is in accordance with Spark’s design. I’m just speaking as a random contributor here, so don’t take this advice as gospel. Nick On Mon Dec 08 2014 at 3:08:02 PM Ganelin, Ilya ilya.gane...@capitalone.com wrote: Thank you for pointing this out, Nick. I know that for myself and my colleague who are starting to contribute to Spark, it¹s definitely discouraging to have fixes sitting in the pipeline. Could you recommend any other ways that we can facilitate getting these PRs accepted? Clean, well-tested code is an obvious one but I¹d like to know if there are some non-obvious things we (as contributors) could do to make the committers¹ lives easier? Thanks! -Ilya On 12/8/14, 11:58 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: I recently came across this blog post, which reminded me of this thread. How to Discourage Open Source Contributions http://danluu.com/discourage-oss/ We are currently at 320+ open PRs, many of which haven't been updated in over a month. We have quite a few PRs that haven't been touched in 3-5 months. *If you have the time and interest, please hop on over to the Spark PR Dashboard https://spark-prs.appspot.com/, sort the PRs by least-recently-updated, and update them where you can.* I share the blog author's opinion that letting PRs go stale discourages contributions, especially from first-time contributors, and especially more so when the PR author is waiting on feedback from a committer or contributor. I've been thinking about simple ways to make it easier for all of us to chip in on controlling stale PRs in an incremental way. For starters, would it help if an automated email went out to the dev list once a week that a) reported the number of stale PRs, and b) directly linked to the 5 least recently updated PRs? Nick On Sat Aug 30 2014 at 3:41:39 AM Nicholas Chammas nicholas.cham...@gmail.com wrote: On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: it's actually precedurally difficult for us to close pull requests Just an FYI: Seems like the GitHub-sanctioned work-around to having issues-only permissions is to have a second, issues-only repository https://help.github.com/articles/issues-only-access-permissions. Not a very attractive work-around... Nick The information contained in this e-mail is confidential and/or proprietary to Capital One and/or its affiliates. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.
Re: Handling stale PRs
On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: it's actually precedurally difficult for us to close pull requests Just an FYI: Seems like the GitHub-sanctioned work-around to having issues-only permissions is to have a second, issues-only repository https://help.github.com/articles/issues-only-access-permissions. Not a very attractive work-around... Nick
Re: Handling stale PRs
On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue, perhaps? Nick
Re: Handling stale PRs
Wonder if it would make sense to introduce a notion of 'Reviewers' as an intermediate tier to help distribute the load? While anyone can review and comment on an open PR, reviewers would be able to say aye or nay subject to confirmation by a committer? Thanks, Nishkam On Wed, Aug 27, 2014 at 2:11 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue, perhaps? Nick
Re: Handling stale PRs
Hey Nishkam, To some extent we already have this process - many community members help review patches and some earn a reputation where committer's will take an LGTM from them seriously. I'd be interested in seeing if any other projects recognize people who do this. - Patrick On Wed, Aug 27, 2014 at 2:36 PM, Nishkam Ravi nr...@cloudera.com wrote: Wonder if it would make sense to introduce a notion of 'Reviewers' as an intermediate tier to help distribute the load? While anyone can review and comment on an open PR, reviewers would be able to say aye or nay subject to confirmation by a committer? Thanks, Nishkam On Wed, Aug 27, 2014 at 2:11 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue, perhaps? Nick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Handling stale PRs
I have a very simple dashboard running at http://spark-prs.appspot.com/. Currently, this mirrors the functionality of Patrick’s github-shim, but it should be very easy to extend with other features. The source is at https://github.com/databricks/spark-pr-dashboard (pull requests and issues welcome!) On August 27, 2014 at 2:11:41 PM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue, perhaps? Nick
Re: Handling stale PRs
Alright! That was quick. :) On Wed, Aug 27, 2014 at 6:48 PM, Josh Rosen rosenvi...@gmail.com wrote: I have a very simple dashboard running at http://spark-prs.appspot.com/. Currently, this mirrors the functionality of Patrick’s github-shim, but it should be very easy to extend with other features. The source is at https://github.com/databricks/spark-pr-dashboard (pull requests and issues welcome!) On August 27, 2014 at 2:11:41 PM, Nicholas Chammas ( nicholas.cham...@gmail.com) wrote: On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue, perhaps? Nick
Re: Handling stale PRs
I see. Yeah, it would be interesting to know if any other project has considered formalizing this notion. It may also enable assignment of reviews (potentially automated using Josh's system) and maybe anonymity as well? On the downside, it isn't easily implemented and probably doesn't come without certain undesired side-effects. Thanks, Nishkam On Wed, Aug 27, 2014 at 3:39 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Nishkam, To some extent we already have this process - many community members help review patches and some earn a reputation where committer's will take an LGTM from them seriously. I'd be interested in seeing if any other projects recognize people who do this. - Patrick On Wed, Aug 27, 2014 at 2:36 PM, Nishkam Ravi nr...@cloudera.com wrote: Wonder if it would make sense to introduce a notion of 'Reviewers' as an intermediate tier to help distribute the load? While anyone can review and comment on an open PR, reviewers would be able to say aye or nay subject to confirmation by a committer? Thanks, Nishkam On Wed, Aug 27, 2014 at 2:11 PM, Nicholas Chammas nicholas.cham...@gmail.com wrote: On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png ). BTW Josh, how can we stay up-to-date on your work on this tool? A JIRA issue, perhaps? Nick
Re: Handling stale PRs
Hey Nicholas, Thanks for bringing this up. There are a few dimensions to this... one is that it's actually precedurally difficult for us to close pull requests. I've proposed several different solutions to ASF infra to streamline the process, but thus far they haven't been open to any of my ideas: https://issues.apache.org/jira/browse/INFRA-7918 https://issues.apache.org/jira/browse/INFRA-8241 The more important thing, maybe, is how we want to deal with this culturally. And I think we need to do a better job of making sure no pull requests go unattended (i.e. waiting for committer feedback). If patches go stale, it should be because the user hasn't responded, not us. Another thing is that we should, IMO, err on the side of explicitly saying no or not yet to patches, rather than letting them linger without attention. We do get patches where the user is well intentioned, but it is a feature that doesn't make sense to add, or isn't well thought out or explained, or the review effort would be so large it's not within our capacity to look at just yet. Most other ASF projects I know just ignore these patches. I'd prefer if we took the approach of politely explaining why in the current form the patch isn't acceptable and closing it (potentially w/ tips on how to improve it or narrow the scope). - Patrick On Mon, Aug 25, 2014 at 9:57 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey Nicholas, In general we've been looking at these periodically (at least I have) and asking people to close out of date ones, but it's true that the list has gotten fairly large. We should probably have an expiry time of a few months and close them automatically. I agree that it's daunting to see so many open PRs. Matei On August 25, 2014 at 9:03:09 PM, Nicholas Chammas ( nicholas.cham...@gmail.com) wrote: Check this out: https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc We're hitting close to 300 open PRs. Those are the least recently updated ones. I think having a low number of stale (i.e. not recently updated) PRs is a good thing to shoot for. It doesn't leave contributors hanging (which feels bad for contributors), and reduces project clutter (which feels bad for maintainers/committers). What is our approach to tackling this problem? I think communicating and enforcing a clear policy on how stale PRs are handled might be a good way to reduce the number of stale PRs we have without making contributors feel rejected. I don't know what such a policy would look like, but it should be enforceable and lightweight--i.e. it shouldn't feel like a hammer used to reject people's work, but rather a necessary tool to keep the project's contributions relevant and manageable. Nick
Re: Handling stale PRs
On 08/26/2014 04:57 AM, Sean Owen wrote: On Tue, Aug 26, 2014 at 7:02 AM, Patrick Wendell pwend...@gmail.com wrote: Most other ASF projects I know just ignore these patches. I'd prefer if we Agree, this drives me crazy. It kills part of JIRA's usefulness. Spark is blessed/cursed with incredible inbound load, but would love to still see the project get this right-er than, say, Hadoop. totally agree, this applies to patches as well as jiras. i'll add that projects who let things simply linger are missing an opportunity to engage their community. spark should capitalize on its momentum to build a smoothly running community (vs not and accept an unbounded backlog as inevitable). The more important thing, maybe, is how we want to deal with this culturally. And I think we need to do a better job of making sure no pull requests go unattended (i.e. waiting for committer feedback). If patches go stale, it should be because the user hasn't responded, not us. Stale JIRAs are a symptom, not a problem per se. I also want to see the backlog cleared, but automatically closing doesn't help, if the problem is too many JIRAs and not enough committer-hours to look at them. Some noise gets closed, but some easy or important fixes may disappear as well. engagement in the community really needs to go both ways. it's reasonable for PRs that stop merging or have open comments that need resolution by the PRer to be loudly timed out. a similar thing goes for jiras, if there's a request for more information to resolve a bug and that information does not appear, half of the communication is gone and a loud time out is reasonable. easy and important are in the eyes of the beholder. timeouts can go both ways. a jira or pr that has been around for a period of time (say 1/3 the to-close timeout) should bump up for evaluation, hopefully resulting in few easy or important issues falling through the cracks. fyi, i'm periodically going through the pyspark jiras, trying to reproduce issues, coalesce duplicates and ask for details. i've not been given any sort of permission to do this, i don't have any special position in the community to do this - in a well functioning community everyone should feel free to jump in and help. Another thing is that we should, IMO, err on the side of explicitly saying no or not yet to patches, rather than letting them linger without attention. We do get patches where the user is well intentioned, but it is Completely agree. The solution is partly more supply of committer time on JIRAs. But that is detracting from the work the committers themselves want to do. More of the solution is reducing demand by helping people create useful, actionable, non-duplicate JIRAs from the start. Or encouraging people to resolve existing JIRAs and shepherding those in. saying no/not-yet is a vitally important piece of information. Elsewhere, I've found people reluctant to close JIRA for fear of offending or turning off contributors. I think the opposite is true. There is nothing wrong with no or not now especially accompanied with constructive feedback. Better to state for the record what is not being looked at and why, than let people work on and open the same JIRAs repeatedly. well stated! I have also found in the past that a culture of tolerating eternal JIRAs led people to file JIRAs in order to forget about a problem -- it's in JIRA, and it's in progress, so it feels like someone else is going to fix it later and so it can be forgotten now. there's some value in these now-i-can-forget jira, though i'm not personally a fan. it can be good to keep them around and reachable by search, but they should be clearly marked as no/not-yet or something similar. For what it's worth, I think these project and culture mechanics are so important and it's my #1 concern for Spark at this stage. This challenge exists so much more here, exactly because there is so much potential. I'd love to help by trying to identify and close stale JIRAs but am afraid that tagging them is just adding to the heap of work. +1 concern and potential! best, matt - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Handling stale PRs
Sean Owen wrote Stale JIRAs are a symptom, not a problem per se. I also want to see the backlog cleared, but automatically closing doesn't help, if the problem is too many JIRAs and not enough committer-hours to look at them. Some noise gets closed, but some easy or important fixes may disappear as well. Agreed. All of the problems mentioned in this thread are symptoms. There's no shortage of talent and enthusiasm within the Spark community. The people and the product are wonderful. The process: not so much. Spark has been wildly successful, some growing pains are to be expected. Given 100+ contributors, Spark is a big project. As with big data, big projects can run into scaling issues. There's no magic to running a successful big project, but it does require greater planning and discipline. JIRA is great for issue tracking, but it's not a replacement for a project plan. Quarterly releases are a great idea, everyone knows the schedule. What we need is concise plan for each release with a clear scope statement. Without knowing what is in scope and out of scope for a release, we end up with a laundry list of things to do, but no clear goal. Laundry lists don't scale well. I don't mind helping with planning and documenting releases. This is especially helpful for new contributors who don't know where to start. I have done that successfully on many projects using Jira and Confluence, so I know it can be done. To address immediate concerns of open PRs and excessive, overlapping Jira issues, we probably have to create a meta issue and assign resources to fix it. I don't mind helping with that also. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-tp8015p8031.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Handling stale PRs
- Original Message - Another thing is that we should, IMO, err on the side of explicitly saying no or not yet to patches, rather than letting them linger without attention. We do get patches where the user is well intentioned, but it is Completely agree. The solution is partly more supply of committer time on JIRAs. But that is detracting from the work the committers themselves want to do. More of the solution is reducing demand by helping people create useful, actionable, non-duplicate JIRAs from the start. Or encouraging people to resolve existing JIRAs and shepherding those in. saying no/not-yet is a vitally important piece of information. +1, when I propose a contribution to a project, I consider an articulate (and hopefully polite) no thanks, or not-yet, or needs-work, to be far more useful and pleasing than just radio silence. It allows me to either address feedback, or just move on. Although it takes effort to keep abreast of community contributions, I don't think it needs to be an overbearing or heavy-weight process. I've seen other communities where they talked themselves out of better management because they conceived the ticket workflow as being more effort than it needed to be. Much better to keep ticket triage and workflow fast/simple, but actually do it. Elsewhere, I've found people reluctant to close JIRA for fear of offending or turning off contributors. I think the opposite is true. There is nothing wrong with no or not now especially accompanied with constructive feedback. Better to state for the record what is not being looked at and why, than let people work on and open the same JIRAs repeatedly. well stated! I have also found in the past that a culture of tolerating eternal JIRAs led people to file JIRAs in order to forget about a problem -- it's in JIRA, and it's in progress, so it feels like someone else is going to fix it later and so it can be forgotten now. there's some value in these now-i-can-forget jira, though i'm not personally a fan. it can be good to keep them around and reachable by search, but they should be clearly marked as no/not-yet or something similar. For what it's worth, I think these project and culture mechanics are so important and it's my #1 concern for Spark at this stage. This challenge exists so much more here, exactly because there is so much potential. I'd love to help by trying to identify and close stale JIRAs but am afraid that tagging them is just adding to the heap of work. +1 concern and potential! best, matt - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: Handling stale PRs
On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: I'd prefer if we took the approach of politely explaining why in the current form the patch isn't acceptable and closing it (potentially w/ tips on how to improve it or narrow the scope). Amen to this. Aiming for such a culture would set Spark apart from other projects in a great way. I've proposed several different solutions to ASF infra to streamline the process, but thus far they haven't been open to any of my ideas: I've added myself as a watcher on those 2 INFRA issues. Sucks that the only solution on offer right now requires basically polluting the commit history. Short of moving Spark's repo to a non-ASF-managed GitHub account, do you think another bot could help us manage the number of stale PRs? I'm thinking a solution as follows might be very helpful: - Extend Spark QA / Jenkins to run on a weekly schedule and check for stale PRs. Let's say a stale PR is an open one that hasn't been updated in N months. - Spark QA maintains a list of known committers on its side. - During its weekly check of stale PRs, Spark QA takes the following action: - If the last person to comment on a PR was a committer, post to the PR asking for an update from the contributor. - If the last person to comment on a PR was a contributor, add the PR to a list. Email this list of *hanging PRs* out to the dev list on a weekly basis and ask committers to update them. - If the last person to comment on a PR was Spark QA asking the contributor to update it, then add the PR to a list. Email this list of *abandoned PRs* to the dev list for the record (or for closing, if that becomes possible in the future). This doesn't solve the problem of not being able to close PRs, but it does help make sure no PR is left hanging for long. What do you think? I'd be interested in implementing this solution if we like it. Nick
Re: Handling stale PRs
Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). Some of my basic goals (not all implemented yet): - Users sign in using GitHub and can browse a list of pull requests, including links to associated JIRAs, Jenkins statuses, a quick preview of the last comment, etc. - Pull requests are auto-classified based on which components they modify (by looking at the diff). - From the app’s own internal database of PRs, we can build dashboards to find “abandoned” PRs, graph average time to first review, etc. - Since we authenticate users with GitHub, we can enable administrative functions via this dashboard (e.g. “assign this PR to me”, “vote to close in the weekly auto-close commit”, etc. Right now, I’ve implemented GItHub OAuth support and code to update the issues database using the GitHub API. Because we have access to the full API, it’s pretty easy to do fancy things like parsing the reason for Jenkins failure, etc. You could even imagine some fancy mashup tools to pull up JIRAs and pull requests side-by in iframes. After I hack on this a bit more, I plan to release a public preview version; if we find this tool useful, I’ll clean it up and open-source the app so folks can contribute to it. - Josh On August 26, 2014 at 8:16:46 AM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: I'd prefer if we took the approach of politely explaining why in the current form the patch isn't acceptable and closing it (potentially w/ tips on how to improve it or narrow the scope). Amen to this. Aiming for such a culture would set Spark apart from other projects in a great way. I've proposed several different solutions to ASF infra to streamline the process, but thus far they haven't been open to any of my ideas: I've added myself as a watcher on those 2 INFRA issues. Sucks that the only solution on offer right now requires basically polluting the commit history. Short of moving Spark's repo to a non-ASF-managed GitHub account, do you think another bot could help us manage the number of stale PRs? I'm thinking a solution as follows might be very helpful: - Extend Spark QA / Jenkins to run on a weekly schedule and check for stale PRs. Let's say a stale PR is an open one that hasn't been updated in N months. - Spark QA maintains a list of known committers on its side. - During its weekly check of stale PRs, Spark QA takes the following action: - If the last person to comment on a PR was a committer, post to the PR asking for an update from the contributor. - If the last person to comment on a PR was a contributor, add the PR to a list. Email this list of *hanging PRs* out to the dev list on a weekly basis and ask committers to update them. - If the last person to comment on a PR was Spark QA asking the contributor to update it, then add the PR to a list. Email this list of *abandoned PRs* to the dev list for the record (or for closing, if that becomes possible in the future). This doesn't solve the problem of not being able to close PRs, but it does help make sure no PR is left hanging for long. What do you think? I'd be interested in implementing this solution if we like it. Nick
Re: Handling stale PRs
OK, that sounds pretty cool. Josh, Do you see this app as encompassing or supplanting the functionality I described as well? Nick On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). Some of my basic goals (not all implemented yet): - Users sign in using GitHub and can browse a list of pull requests, including links to associated JIRAs, Jenkins statuses, a quick preview of the last comment, etc. - Pull requests are auto-classified based on which components they modify (by looking at the diff). - From the app’s own internal database of PRs, we can build dashboards to find “abandoned” PRs, graph average time to first review, etc. - Since we authenticate users with GitHub, we can enable administrative functions via this dashboard (e.g. “assign this PR to me”, “vote to close in the weekly auto-close commit”, etc. Right now, I’ve implemented GItHub OAuth support and code to update the issues database using the GitHub API. Because we have access to the full API, it’s pretty easy to do fancy things like parsing the reason for Jenkins failure, etc. You could even imagine some fancy mashup tools to pull up JIRAs and pull requests side-by in iframes. After I hack on this a bit more, I plan to release a public preview version; if we find this tool useful, I’ll clean it up and open-source the app so folks can contribute to it. - Josh On August 26, 2014 at 8:16:46 AM, Nicholas Chammas ( nicholas.cham...@gmail.com) wrote: On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: I'd prefer if we took the approach of politely explaining why in the current form the patch isn't acceptable and closing it (potentially w/ tips on how to improve it or narrow the scope). Amen to this. Aiming for such a culture would set Spark apart from other projects in a great way. I've proposed several different solutions to ASF infra to streamline the process, but thus far they haven't been open to any of my ideas: I've added myself as a watcher on those 2 INFRA issues. Sucks that the only solution on offer right now requires basically polluting the commit history. Short of moving Spark's repo to a non-ASF-managed GitHub account, do you think another bot could help us manage the number of stale PRs? I'm thinking a solution as follows might be very helpful: - Extend Spark QA / Jenkins to run on a weekly schedule and check for stale PRs. Let's say a stale PR is an open one that hasn't been updated in N months. - Spark QA maintains a list of known committers on its side. - During its weekly check of stale PRs, Spark QA takes the following action: - If the last person to comment on a PR was a committer, post to the PR asking for an update from the contributor. - If the last person to comment on a PR was a contributor, add the PR to a list. Email this list of *hanging PRs* out to the dev list on a weekly basis and ask committers to update them. - If the last person to comment on a PR was Spark QA asking the contributor to update it, then add the PR to a list. Email this list of *abandoned PRs* to the dev list for the record (or for closing, if that becomes possible in the future). This doesn't solve the problem of not being able to close PRs, but it does help make sure no PR is left hanging for long. What do you think? I'd be interested in implementing this solution if we like it. Nick
Re: Handling stale PRs
Sure; App Engine supports cron and sending emails. We can configure the app with Spark QA’s credentials in order to allow it to post comments on issues, etc. - Josh On August 26, 2014 at 11:38:08 AM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: OK, that sounds pretty cool. Josh, Do you see this app as encompassing or supplanting the functionality I described as well? Nick On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). Some of my basic goals (not all implemented yet): - Users sign in using GitHub and can browse a list of pull requests, including links to associated JIRAs, Jenkins statuses, a quick preview of the last comment, etc. - Pull requests are auto-classified based on which components they modify (by looking at the diff). - From the app’s own internal database of PRs, we can build dashboards to find “abandoned” PRs, graph average time to first review, etc. - Since we authenticate users with GitHub, we can enable administrative functions via this dashboard (e.g. “assign this PR to me”, “vote to close in the weekly auto-close commit”, etc. Right now, I’ve implemented GItHub OAuth support and code to update the issues database using the GitHub API. Because we have access to the full API, it’s pretty easy to do fancy things like parsing the reason for Jenkins failure, etc. You could even imagine some fancy mashup tools to pull up JIRAs and pull requests side-by in iframes. After I hack on this a bit more, I plan to release a public preview version; if we find this tool useful, I’ll clean it up and open-source the app so folks can contribute to it. - Josh On August 26, 2014 at 8:16:46 AM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: I'd prefer if we took the approach of politely explaining why in the current form the patch isn't acceptable and closing it (potentially w/ tips on how to improve it or narrow the scope). Amen to this. Aiming for such a culture would set Spark apart from other projects in a great way. I've proposed several different solutions to ASF infra to streamline the process, but thus far they haven't been open to any of my ideas: I've added myself as a watcher on those 2 INFRA issues. Sucks that the only solution on offer right now requires basically polluting the commit history. Short of moving Spark's repo to a non-ASF-managed GitHub account, do you think another bot could help us manage the number of stale PRs? I'm thinking a solution as follows might be very helpful: - Extend Spark QA / Jenkins to run on a weekly schedule and check for stale PRs. Let's say a stale PR is an open one that hasn't been updated in N months. - Spark QA maintains a list of known committers on its side. - During its weekly check of stale PRs, Spark QA takes the following action: - If the last person to comment on a PR was a committer, post to the PR asking for an update from the contributor. - If the last person to comment on a PR was a contributor, add the PR to a list. Email this list of *hanging PRs* out to the dev list on a weekly basis and ask committers to update them. - If the last person to comment on a PR was Spark QA asking the contributor to update it, then add the PR to a list. Email this list of *abandoned PRs* to the dev list for the record (or for closing, if that becomes possible in the future). This doesn't solve the problem of not being able to close PRs, but it does help make sure no PR is left hanging for long. What do you think? I'd be interested in implementing this solution if we like it. Nick
Re: Handling stale PRs
By the way, as a reference point, I just stumbled across the Discourse GitHub project and their list of pull requests https://github.com/discourse/discourse/pulls looks pretty neat. ~2,200 closed PRs, 6 open. Least recently updated PR dates to 8 days ago. Project started ~1.5 years ago. Dunno how many committers Discourse has, but it looks like they've managed their PRs well. I hope we can do as well in this regard as they have. Nick On Tue, Aug 26, 2014 at 2:40 PM, Josh Rosen rosenvi...@gmail.com wrote: Sure; App Engine supports cron and sending emails. We can configure the app with Spark QA’s credentials in order to allow it to post comments on issues, etc. - Josh On August 26, 2014 at 11:38:08 AM, Nicholas Chammas ( nicholas.cham...@gmail.com) wrote: OK, that sounds pretty cool. Josh, Do you see this app as encompassing or supplanting the functionality I described as well? Nick On Tue, Aug 26, 2014 at 2:21 PM, Josh Rosen rosenvi...@gmail.com wrote: Last weekend, I started hacking on a Google App Engine app for helping with pull request review (screenshot: http://i.imgur.com/wwpZKYZ.png). Some of my basic goals (not all implemented yet): - Users sign in using GitHub and can browse a list of pull requests, including links to associated JIRAs, Jenkins statuses, a quick preview of the last comment, etc. - Pull requests are auto-classified based on which components they modify (by looking at the diff). - From the app’s own internal database of PRs, we can build dashboards to find “abandoned” PRs, graph average time to first review, etc. - Since we authenticate users with GitHub, we can enable administrative functions via this dashboard (e.g. “assign this PR to me”, “vote to close in the weekly auto-close commit”, etc. Right now, I’ve implemented GItHub OAuth support and code to update the issues database using the GitHub API. Because we have access to the full API, it’s pretty easy to do fancy things like parsing the reason for Jenkins failure, etc. You could even imagine some fancy mashup tools to pull up JIRAs and pull requests side-by in iframes. After I hack on this a bit more, I plan to release a public preview version; if we find this tool useful, I’ll clean it up and open-source the app so folks can contribute to it. - Josh On August 26, 2014 at 8:16:46 AM, Nicholas Chammas ( nicholas.cham...@gmail.com) wrote: On Tue, Aug 26, 2014 at 2:02 AM, Patrick Wendell pwend...@gmail.com wrote: I'd prefer if we took the approach of politely explaining why in the current form the patch isn't acceptable and closing it (potentially w/ tips on how to improve it or narrow the scope). Amen to this. Aiming for such a culture would set Spark apart from other projects in a great way. I've proposed several different solutions to ASF infra to streamline the process, but thus far they haven't been open to any of my ideas: I've added myself as a watcher on those 2 INFRA issues. Sucks that the only solution on offer right now requires basically polluting the commit history. Short of moving Spark's repo to a non-ASF-managed GitHub account, do you think another bot could help us manage the number of stale PRs? I'm thinking a solution as follows might be very helpful: - Extend Spark QA / Jenkins to run on a weekly schedule and check for stale PRs. Let's say a stale PR is an open one that hasn't been updated in N months. - Spark QA maintains a list of known committers on its side. - During its weekly check of stale PRs, Spark QA takes the following action: - If the last person to comment on a PR was a committer, post to the PR asking for an update from the contributor. - If the last person to comment on a PR was a contributor, add the PR to a list. Email this list of *hanging PRs* out to the dev list on a weekly basis and ask committers to update them. - If the last person to comment on a PR was Spark QA asking the contributor to update it, then add the PR to a list. Email this list of *abandoned PRs* to the dev list for the record (or for closing, if that becomes possible in the future). This doesn't solve the problem of not being able to close PRs, but it does help make sure no PR is left hanging for long. What do you think? I'd be interested in implementing this solution if we like it. Nick
Re: Handling stale PRs
Nicholas Chammas wrote Dunno how many committers Discourse has, but it looks like they've managed their PRs well. I hope we can do as well in this regard as they have. Discourse developers appear to eat their own dog food https://meta.discourse.org . Improved collaboration and a shared vision might be a reason for their success. - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-stale-PRs-tp8015p8061.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Handling stale PRs
Check this out: https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc We're hitting close to 300 open PRs. Those are the least recently updated ones. I think having a low number of stale (i.e. not recently updated) PRs is a good thing to shoot for. It doesn't leave contributors hanging (which feels bad for contributors), and reduces project clutter (which feels bad for maintainers/committers). What is our approach to tackling this problem? I think communicating and enforcing a clear policy on how stale PRs are handled might be a good way to reduce the number of stale PRs we have without making contributors feel rejected. I don't know what such a policy would look like, but it should be enforceable and lightweight--i.e. it shouldn't feel like a hammer used to reject people's work, but rather a necessary tool to keep the project's contributions relevant and manageable. Nick
Re: Handling stale PRs
Hey Nicholas, In general we've been looking at these periodically (at least I have) and asking people to close out of date ones, but it's true that the list has gotten fairly large. We should probably have an expiry time of a few months and close them automatically. I agree that it's daunting to see so many open PRs. Matei On August 25, 2014 at 9:03:09 PM, Nicholas Chammas (nicholas.cham...@gmail.com) wrote: Check this out: https://github.com/apache/spark/pulls?q=is%3Aopen+is%3Apr+sort%3Aupdated-asc We're hitting close to 300 open PRs. Those are the least recently updated ones. I think having a low number of stale (i.e. not recently updated) PRs is a good thing to shoot for. It doesn't leave contributors hanging (which feels bad for contributors), and reduces project clutter (which feels bad for maintainers/committers). What is our approach to tackling this problem? I think communicating and enforcing a clear policy on how stale PRs are handled might be a good way to reduce the number of stale PRs we have without making contributors feel rejected. I don't know what such a policy would look like, but it should be enforceable and lightweight--i.e. it shouldn't feel like a hammer used to reject people's work, but rather a necessary tool to keep the project's contributions relevant and manageable. Nick