Re: Improving PR workload management for Arrow maintainers
I investigated the cpython approach and the PR labelling is a part of the existing bedevere bot which does a number of things (not all relevant to Arrow). Yesterday I created a standalone Github action[1] dedicated to this task roughly based on my previous email. It will apply "awaiting-review" and "awaiting-changes" labels when appropriate. I think it's probably ready to try out at this point (I'm sure there will be some hiccups). If any repo wants to volunteer to be a guinea pig I will work with you and get the action configured and running. I have it enabled on a dummy repository here[2] and this is what it looks like in action[3]. [1] https://github.com/westonpace/pr-needs-review/ [2] https://github.com/westonpace/pr-needs-review-dummy-2/blob/main/.github/workflows/label-pr.yml [3] https://github.com/westonpace/pr-needs-review-dummy-2/pull/13 On Thu, Jul 1, 2021 at 11:36 AM Adam Lippai wrote: > > Not sure if it's applicable, but GitHub is improving: > https://github.blog/changelog/2021-06-23-whats-new-with-github-issues/ > > That spreadsheet-like issue tracking looks concise. > > Best regards, > Adam Lippai > > On Wed, Jun 30, 2021, 10:28 Antoine Pitrou wrote: > > > > > Le 30/06/2021 à 10:04, Wes McKinney a écrit : > > > > > > I guess my concern with this is how to quickly separate out "PRs I am > > > keeping an eye on". If there are 100 active PRs and only 20 of them > > > are ones you've interacted with, how do you know which ones need your > > > attention? GitHub does have the "reviewed-by" filter which could be > > > good enough > > > > There's also the "involves" filter that can also select PRs you have > > commented on without giving a formal review. > > > > However, those filters don't let you know which PRs are pending review > > if you haven't already commented on them. > > > > Regards > > > > Antoine. > >
Re: Improving PR workload management for Arrow maintainers
Not sure if it's applicable, but GitHub is improving: https://github.blog/changelog/2021-06-23-whats-new-with-github-issues/ That spreadsheet-like issue tracking looks concise. Best regards, Adam Lippai On Wed, Jun 30, 2021, 10:28 Antoine Pitrou wrote: > > Le 30/06/2021 à 10:04, Wes McKinney a écrit : > > > > I guess my concern with this is how to quickly separate out "PRs I am > > keeping an eye on". If there are 100 active PRs and only 20 of them > > are ones you've interacted with, how do you know which ones need your > > attention? GitHub does have the "reviewed-by" filter which could be > > good enough > > There's also the "involves" filter that can also select PRs you have > commented on without giving a formal review. > > However, those filters don't let you know which PRs are pending review > if you haven't already commented on them. > > Regards > > Antoine. >
Re: Improving PR workload management for Arrow maintainers
Le 30/06/2021 à 10:04, Wes McKinney a écrit : I guess my concern with this is how to quickly separate out "PRs I am keeping an eye on". If there are 100 active PRs and only 20 of them are ones you've interacted with, how do you know which ones need your attention? GitHub does have the "reviewed-by" filter which could be good enough There's also the "involves" filter that can also select PRs you have commented on without giving a formal review. However, those filters don't let you know which PRs are pending review if you haven't already commented on them. Regards Antoine.
Re: Improving PR workload management for Arrow maintainers
On Tue, Jun 29, 2021 at 8:05 PM Weston Pace wrote: > > I apologize. I did plan on working on this but it's taken a back seat > for a while. I would still recommend shying away from a standalone > UI. You will end up making a lot of requests (and possibly running > into Github throttles) if you want detailed PR information for all of > the PRs. To work around those limitations the Spark example that I > looked at kept a standalone database and polled Github on a regular > basis. This works but then you have quite a bit of complexity (it's > no longer a simple static web page you can just host somewhere, you'll > need to pay for a backend server and also the cost of maintaining that > server). Also, you may find yourself continuously playing catchup to > add features that exist in Github or face users migrating away from > the custom tool. On this I will say: * The rate limit for GitHub API calls is 5000 per hour per user, so if you polled PRs once every 5 minutes, you could keep 200 or so PRs up to date that way (assuming ~2 GitHub API calls per PR), and more if we relied on a rotation of bot API tokens * GitHub's REST API features are relatively slow-moving * A small DigitalOcean server that would be adequate for this would cost less than $100/month > The approach I was pursuing was a single Github action repository to > add labels similar to those described by Andrew Lamb. You could make > it quite complex but I think a simple state machine would be: > > New PR Created (not in draft) -> Add "Needs review" label > PR moved into draft -> Remove "Needs review" and "changes requested" labels. > PR Review added with state "Changes Needed" -> Remove "Needs review", > add "changes requested", add comment explaining how to report changes > have been made > Comment made with "I have completed all requested changes" -> Remove > "changes requested", add "needs review", re-request all reviewers > Nightly cron job -> Any PR that has had the "needs review" label for X > days gets "needs attention" label > Nightly cron job -> Any PR that has had the "changes requested" label > for Y days gets "stale" label, add comment explaining why that > happened and encouraging the user to state if they want someone else > to take over the PR. I guess my concern with this is how to quickly separate out "PRs I am keeping an eye on". If there are 100 active PRs and only 20 of them are ones you've interacted with, how do you know which ones need your attention? GitHub does have the "reviewed-by" filter which could be good enough https://github.com/apache/arrow/pulls?q=is%3Apr+is%3Aclosed+reviewed-by%3A%40me One potential benefit of the web app approach would be for reviewers to be able to "watch" reviews that they want to show up in their "my reviews" page even if they have yet to actually comment or review. > I investigated automatic adding/removing of labels based on passing / > failing checks but the checks in Arrow are not stable enough I think > and getting that information out of Github is rather tricky. > > I don't know that I'll have time to work on this at the moment but I > think it'd be pretty straightforward to build such an action if anyone > is interested. Also, it sounds like cython has something similar. If > it is simple enough we could jsut steal it. > > On Tue, Jun 29, 2021 at 6:00 AM Antoine Pitrou wrote: > > > > > > Le 29/06/2021 à 15:25, Wes McKinney a écrit : > > > On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb wrote: > > >> > > >> The thing that would make me more efficient reviewing PRs is figuring out > > >> which one of the open reviews are ready for additional feedback. > > > > > > Yes, I think this would be the single most significant quality-of-life > > > improvement for reviewers. > > > > Agreed as well. > > > > The CPython project uses dedicated labels for that (some automatically > > set/unset) as well as a bot that pesters contributors to mention when > > their PR is ready for review again. It helps assert that the labelled > > PR status reflects their actual status accurately. > > > > See some examples here: > > https://github.com/python/cpython/pull/26941#issuecomment-870643346 > > https://github.com/python/cpython/pull/26772#issuecomment-866020819 > > https://github.com/python/cpython/pull/26677#pullrequestreview-682724234 > > > > Regards > > > > Antoine. > > > > > > > > > >> I think the idea of a webapp or something that shows active reviews would > > >> be helpful (though I get most of that from appropriate email filters). > > >> > > >> What about a system involving labels (for which there is already a basic > > >> GUI in github)? Something low tech like > > >> > > >> (Waiting for Review) > > >> (Addressing Feedback) > > >> (Approved, waiting for Merge) > > >> > > >> With maybe some automation prompting people to add the "Waiting on > > >> Review" > > >> label when they want feedback > > > > > > I think it would have to be a bot that automatically sets the labels. > > > If it req
Re: Improving PR workload management for Arrow maintainers
I apologize. I did plan on working on this but it's taken a back seat for a while. I would still recommend shying away from a standalone UI. You will end up making a lot of requests (and possibly running into Github throttles) if you want detailed PR information for all of the PRs. To work around those limitations the Spark example that I looked at kept a standalone database and polled Github on a regular basis. This works but then you have quite a bit of complexity (it's no longer a simple static web page you can just host somewhere, you'll need to pay for a backend server and also the cost of maintaining that server). Also, you may find yourself continuously playing catchup to add features that exist in Github or face users migrating away from the custom tool. The approach I was pursuing was a single Github action repository to add labels similar to those described by Andrew Lamb. You could make it quite complex but I think a simple state machine would be: New PR Created (not in draft) -> Add "Needs review" label PR moved into draft -> Remove "Needs review" and "changes requested" labels. PR Review added with state "Changes Needed" -> Remove "Needs review", add "changes requested", add comment explaining how to report changes have been made Comment made with "I have completed all requested changes" -> Remove "changes requested", add "needs review", re-request all reviewers Nightly cron job -> Any PR that has had the "needs review" label for X days gets "needs attention" label Nightly cron job -> Any PR that has had the "changes requested" label for Y days gets "stale" label, add comment explaining why that happened and encouraging the user to state if they want someone else to take over the PR. I investigated automatic adding/removing of labels based on passing / failing checks but the checks in Arrow are not stable enough I think and getting that information out of Github is rather tricky. I don't know that I'll have time to work on this at the moment but I think it'd be pretty straightforward to build such an action if anyone is interested. Also, it sounds like cython has something similar. If it is simple enough we could jsut steal it. On Tue, Jun 29, 2021 at 6:00 AM Antoine Pitrou wrote: > > > Le 29/06/2021 à 15:25, Wes McKinney a écrit : > > On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb wrote: > >> > >> The thing that would make me more efficient reviewing PRs is figuring out > >> which one of the open reviews are ready for additional feedback. > > > > Yes, I think this would be the single most significant quality-of-life > > improvement for reviewers. > > Agreed as well. > > The CPython project uses dedicated labels for that (some automatically > set/unset) as well as a bot that pesters contributors to mention when > their PR is ready for review again. It helps assert that the labelled > PR status reflects their actual status accurately. > > See some examples here: > https://github.com/python/cpython/pull/26941#issuecomment-870643346 > https://github.com/python/cpython/pull/26772#issuecomment-866020819 > https://github.com/python/cpython/pull/26677#pullrequestreview-682724234 > > Regards > > Antoine. > > > > > >> I think the idea of a webapp or something that shows active reviews would > >> be helpful (though I get most of that from appropriate email filters). > >> > >> What about a system involving labels (for which there is already a basic > >> GUI in github)? Something low tech like > >> > >> (Waiting for Review) > >> (Addressing Feedback) > >> (Approved, waiting for Merge) > >> > >> With maybe some automation prompting people to add the "Waiting on Review" > >> label when they want feedback > > > > I think it would have to be a bot that automatically sets the labels. > > If it requires contributors to take some action outside of pushing new > > work (new commits or a rebased version of the patch) to the PR and > > leaving responses to comments on the PR, the system is likely to fail > > some non-trivial percentage of the time. > > > > Given the quality of off-the-shelf web app components nowadays (e.g. > > https://material-ui.com), throwing together a read-only PR dashboard > > that shows what has changed since you last interacted with them (along > > with some other helpful things, like whether the build is passing) is > > "probably" not a super heavy lift. I haven't done any frontend > > development in years so while the backend part (writing Python code to > > wrangle data from GitHub's REST API and put it in a SQLite database) > > wouldn't take very long I would need some help on the front end > > portion and setting it up for deployment on DigitalOcean or somewhere. > > > >> Andrew > >> > >> On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney wrote: > >> > >>> hi folks, > >>> > >>> I've noted that the volume of PRs for Arrow has been steadily > >>> increasing (and will likely continue to increase), and while I've > >>> personally had less time for development / maintenance / code
Re: Improving PR workload management for Arrow maintainers
Le 29/06/2021 à 15:25, Wes McKinney a écrit : On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb wrote: The thing that would make me more efficient reviewing PRs is figuring out which one of the open reviews are ready for additional feedback. Yes, I think this would be the single most significant quality-of-life improvement for reviewers. Agreed as well. The CPython project uses dedicated labels for that (some automatically set/unset) as well as a bot that pesters contributors to mention when their PR is ready for review again. It helps assert that the labelled PR status reflects their actual status accurately. See some examples here: https://github.com/python/cpython/pull/26941#issuecomment-870643346 https://github.com/python/cpython/pull/26772#issuecomment-866020819 https://github.com/python/cpython/pull/26677#pullrequestreview-682724234 Regards Antoine. I think the idea of a webapp or something that shows active reviews would be helpful (though I get most of that from appropriate email filters). What about a system involving labels (for which there is already a basic GUI in github)? Something low tech like (Waiting for Review) (Addressing Feedback) (Approved, waiting for Merge) With maybe some automation prompting people to add the "Waiting on Review" label when they want feedback I think it would have to be a bot that automatically sets the labels. If it requires contributors to take some action outside of pushing new work (new commits or a rebased version of the patch) to the PR and leaving responses to comments on the PR, the system is likely to fail some non-trivial percentage of the time. Given the quality of off-the-shelf web app components nowadays (e.g. https://material-ui.com), throwing together a read-only PR dashboard that shows what has changed since you last interacted with them (along with some other helpful things, like whether the build is passing) is "probably" not a super heavy lift. I haven't done any frontend development in years so while the backend part (writing Python code to wrangle data from GitHub's REST API and put it in a SQLite database) wouldn't take very long I would need some help on the front end portion and setting it up for deployment on DigitalOcean or somewhere. Andrew On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney wrote: hi folks, I've noted that the volume of PRs for Arrow has been steadily increasing (and will likely continue to increase), and while I've personally had less time for development / maintenance / code reviews over the last year, I would like to have a discussion about what we could do to improve our tooling for maintainers to optimize the efficiency of time spent tending to the PR queue. In my own experience, I have felt that I have wasted a lot of time digging around the queue looking for PRs that are awaiting feedback or need to be merged. I note first of all that around 70 out of 173 open PRs have been updated in the last 7 days, so while there is some PR staleness, to have nearly half of the PRs active is pretty good. That said, ~70 active PRs is a lot of PRs to tend to. I scraped the project's code review comment history, and here are the individuals who have left the most comments on PRs since genesis pitrou6802 wesm 5023 emkornfield 3032 bkietz2834 kou 1489 nealrichardson1439 fsaintjacques 1356 kszucs1250 alamb 1133 jorisvandenbossche1094 liyafan82 831 lidavidm 816 westonpace 794 xhochy 770 nevi-me643 BryanCutler639 jorgecarleitao 635 cpcloud551 sunchao536 ianmcook 499 Since we're probably stuck using GitHub to receive code contributions (as opposed to systems — Gerrit is one I'm familiar with — that provide more structure for reviewers to track the patches they "own" as well as the outgoing/incoming state of reviews), I am wondering what kinds of tools we could create to make it easier for maintainers to keep track of PRs they are shepherding through the contribution process. Ideally this wouldn't involve maintainers having to engage in some explicit action like assigning themselves as a PR reviewer. Here's one idea: a web application that displays "your reviews", a table of PRs that you have interacted with in any way (commented, left code review, assigned as reviewer, someone mentioned you, etc.) sorted either by last commit or last comment to assess "freshness". So if you comment on a PR or leave a code review, it will automatically show up in "your reviews". It could also indicate whether there has been activity on the PR since the last time you interacted with it. Having now used the GitHub API to pull comments from PRs for the above analysis, there is certainly enough information available to help create this kind of tool. I'd be willing to
Re: Improving PR workload management for Arrow maintainers
I review a decent number of PRs for Apache Beam, and I've built some of my own tooling to help keep track of open PRs. I wrote a script that pulls metadata about all relevant PRs and uses some heuristics to categorize them into: - incoming review - outgoing review - "CC'd" - where I've been mentioned but am not the reviewer or author In the first two cases I try to highlight the ones that need my attention, simply by detecting if I'm the person who took the most recent action or not. This works reasonably well but gets tripped up on several edge cases: 1) The author might push multiple commits before they're actually ready for more feedback. 2) A PR might need feedback from multiple reviewers (e.g. people with domain knowledge of certain areas). I've been planning to make my script stateful so that I can mark a PR as "not my turn" (i.e. unhighlight this until there is more activity), and maybe "never my turn" (i.e. I've finished reviewing this, it's waiting on someone else), to handle these cases. The idea of a "Addressing Feedback" -> "Waiting on Review" label that is automatically transitioned when there is activity would run into these same edge cases. If a reviewer had the ability to bump the label back to "Addressing Feedback", that would at least address #1. I think Wes's proposal (a read-only web UI) would likely also run into these edge cases since it stores no state of its own to deconflict in those situations. Brian On Tue, Jun 29, 2021 at 6:26 AM Wes McKinney wrote: > On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb wrote: > > > > The thing that would make me more efficient reviewing PRs is figuring out > > which one of the open reviews are ready for additional feedback. > > Yes, I think this would be the single most significant quality-of-life > improvement for reviewers. > > > I think the idea of a webapp or something that shows active reviews would > > be helpful (though I get most of that from appropriate email filters). > > > > What about a system involving labels (for which there is already a basic > > GUI in github)? Something low tech like > > > > (Waiting for Review) > > (Addressing Feedback) > > (Approved, waiting for Merge) > > > > With maybe some automation prompting people to add the "Waiting on > Review" > > label when they want feedback > > I think it would have to be a bot that automatically sets the labels. > If it requires contributors to take some action outside of pushing new > work (new commits or a rebased version of the patch) to the PR and > leaving responses to comments on the PR, the system is likely to fail > some non-trivial percentage of the time. > Given the quality of off-the-shelf web app components nowadays (e.g. > https://material-ui.com), throwing together a read-only PR dashboard > that shows what has changed since you last interacted with them (along > with some other helpful things, like whether the build is passing) is > "probably" not a super heavy lift. I haven't done any frontend > development in years so while the backend part (writing Python code to > wrangle data from GitHub's REST API and put it in a SQLite database) > wouldn't take very long I would need some help on the front end > portion and setting it up for deployment on DigitalOcean or somewhere. > > > Andrew > > > > On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney > wrote: > > > > > hi folks, > > > > > > I've noted that the volume of PRs for Arrow has been steadily > > > increasing (and will likely continue to increase), and while I've > > > personally had less time for development / maintenance / code reviews > > > over the last year, I would like to have a discussion about what we > > > could do to improve our tooling for maintainers to optimize the > > > efficiency of time spent tending to the PR queue. In my own > > > experience, I have felt that I have wasted a lot of time digging > > > around the queue looking for PRs that are awaiting feedback or need to > > > be merged. > > > > > > I note first of all that around 70 out of 173 open PRs have been > > > updated in the last 7 days, so while there is some PR staleness, to > > > have nearly half of the PRs active is pretty good. That said, ~70 > > > active PRs is a lot of PRs to tend to. > > > > > > I scraped the project's code review comment history, and here are the > > > individuals who have left the most comments on PRs since genesis > > > > > > pitrou6802 > > > wesm 5023 > > > emkornfield 3032 > > > bkietz2834 > > > kou 1489 > > > nealrichardson1439 > > > fsaintjacques 1356 > > > kszucs1250 > > > alamb 1133 > > > jorisvandenbossche1094 > > > liyafan82 831 > > > lidavidm 816 > > > westonpace 794 > > > xhochy 770 > > > nevi-me643 > > > BryanCutler639 > > > jorgecarleitao 635 > > > cpcloud551 > > > sunc
Re: Improving PR workload management for Arrow maintainers
I just had a quick chat over the ASF's slack with Daniel Gruno from the infra team and they are rolling out the "triage role" [1] for non-committers, which AFAIK offers useful tools in this context: * add/remove labels * assign reviewees * mark duplicates * close, open and assign to issues and PRs One does not disregard the other, just though it could be useful information to this topic, as maybe this cover some ground? Best, Jorge [1] https://docs.github.com/en/organizations/managing-access-to-your-organizations-repositories/repository-permission-levels-for-an-organization On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb wrote: > The thing that would make me more efficient reviewing PRs is figuring out > which one of the open reviews are ready for additional feedback. > > I think the idea of a webapp or something that shows active reviews would > be helpful (though I get most of that from appropriate email filters). > > What about a system involving labels (for which there is already a basic > GUI in github)? Something low tech like > > (Waiting for Review) > (Addressing Feedback) > (Approved, waiting for Merge) > > With maybe some automation prompting people to add the "Waiting on Review" > label when they want feedback > > Andrew > > On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney wrote: > > > hi folks, > > > > I've noted that the volume of PRs for Arrow has been steadily > > increasing (and will likely continue to increase), and while I've > > personally had less time for development / maintenance / code reviews > > over the last year, I would like to have a discussion about what we > > could do to improve our tooling for maintainers to optimize the > > efficiency of time spent tending to the PR queue. In my own > > experience, I have felt that I have wasted a lot of time digging > > around the queue looking for PRs that are awaiting feedback or need to > > be merged. > > > > I note first of all that around 70 out of 173 open PRs have been > > updated in the last 7 days, so while there is some PR staleness, to > > have nearly half of the PRs active is pretty good. That said, ~70 > > active PRs is a lot of PRs to tend to. > > > > I scraped the project's code review comment history, and here are the > > individuals who have left the most comments on PRs since genesis > > > > pitrou6802 > > wesm 5023 > > emkornfield 3032 > > bkietz2834 > > kou 1489 > > nealrichardson1439 > > fsaintjacques 1356 > > kszucs1250 > > alamb 1133 > > jorisvandenbossche1094 > > liyafan82 831 > > lidavidm 816 > > westonpace 794 > > xhochy 770 > > nevi-me643 > > BryanCutler639 > > jorgecarleitao 635 > > cpcloud551 > > sunchao536 > > ianmcook 499 > > > > Since we're probably stuck using GitHub to receive code contributions > > (as opposed to systems — Gerrit is one I'm familiar with — that > > provide more structure for reviewers to track the patches they "own" > > as well as the outgoing/incoming state of reviews), I am wondering > > what kinds of tools we could create to make it easier for maintainers > > to keep track of PRs they are shepherding through the contribution > > process. Ideally this wouldn't involve maintainers having to engage in > > some explicit action like assigning themselves as a PR reviewer. > > > > Here's one idea: a web application that displays "your reviews", a > > table of PRs that you have interacted with in any way (commented, left > > code review, assigned as reviewer, someone mentioned you, etc.) sorted > > either by last commit or last comment to assess "freshness". So if you > > comment on a PR or leave a code review, it will automatically show up > > in "your reviews". It could also indicate whether there has been > > activity on the PR since the last time you interacted with it. > > > > Having now used the GitHub API to pull comments from PRs for the above > > analysis, there is certainly enough information available to help > > create this kind of tool. I'd be willing to contribute to building the > > backend of such a web application. > > > > This is just one idea, but I am curious to hear from others who are > > spending a lot of time doing code review / PR merging to see what > > might help them use their time more effectively. > > > > Thanks, > > Wes > > >
Re: Improving PR workload management for Arrow maintainers
On Tue, Jun 29, 2021 at 3:10 PM Andrew Lamb wrote: > > The thing that would make me more efficient reviewing PRs is figuring out > which one of the open reviews are ready for additional feedback. Yes, I think this would be the single most significant quality-of-life improvement for reviewers. > I think the idea of a webapp or something that shows active reviews would > be helpful (though I get most of that from appropriate email filters). > > What about a system involving labels (for which there is already a basic > GUI in github)? Something low tech like > > (Waiting for Review) > (Addressing Feedback) > (Approved, waiting for Merge) > > With maybe some automation prompting people to add the "Waiting on Review" > label when they want feedback I think it would have to be a bot that automatically sets the labels. If it requires contributors to take some action outside of pushing new work (new commits or a rebased version of the patch) to the PR and leaving responses to comments on the PR, the system is likely to fail some non-trivial percentage of the time. Given the quality of off-the-shelf web app components nowadays (e.g. https://material-ui.com), throwing together a read-only PR dashboard that shows what has changed since you last interacted with them (along with some other helpful things, like whether the build is passing) is "probably" not a super heavy lift. I haven't done any frontend development in years so while the backend part (writing Python code to wrangle data from GitHub's REST API and put it in a SQLite database) wouldn't take very long I would need some help on the front end portion and setting it up for deployment on DigitalOcean or somewhere. > Andrew > > On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney wrote: > > > hi folks, > > > > I've noted that the volume of PRs for Arrow has been steadily > > increasing (and will likely continue to increase), and while I've > > personally had less time for development / maintenance / code reviews > > over the last year, I would like to have a discussion about what we > > could do to improve our tooling for maintainers to optimize the > > efficiency of time spent tending to the PR queue. In my own > > experience, I have felt that I have wasted a lot of time digging > > around the queue looking for PRs that are awaiting feedback or need to > > be merged. > > > > I note first of all that around 70 out of 173 open PRs have been > > updated in the last 7 days, so while there is some PR staleness, to > > have nearly half of the PRs active is pretty good. That said, ~70 > > active PRs is a lot of PRs to tend to. > > > > I scraped the project's code review comment history, and here are the > > individuals who have left the most comments on PRs since genesis > > > > pitrou6802 > > wesm 5023 > > emkornfield 3032 > > bkietz2834 > > kou 1489 > > nealrichardson1439 > > fsaintjacques 1356 > > kszucs1250 > > alamb 1133 > > jorisvandenbossche1094 > > liyafan82 831 > > lidavidm 816 > > westonpace 794 > > xhochy 770 > > nevi-me643 > > BryanCutler639 > > jorgecarleitao 635 > > cpcloud551 > > sunchao536 > > ianmcook 499 > > > > Since we're probably stuck using GitHub to receive code contributions > > (as opposed to systems — Gerrit is one I'm familiar with — that > > provide more structure for reviewers to track the patches they "own" > > as well as the outgoing/incoming state of reviews), I am wondering > > what kinds of tools we could create to make it easier for maintainers > > to keep track of PRs they are shepherding through the contribution > > process. Ideally this wouldn't involve maintainers having to engage in > > some explicit action like assigning themselves as a PR reviewer. > > > > Here's one idea: a web application that displays "your reviews", a > > table of PRs that you have interacted with in any way (commented, left > > code review, assigned as reviewer, someone mentioned you, etc.) sorted > > either by last commit or last comment to assess "freshness". So if you > > comment on a PR or leave a code review, it will automatically show up > > in "your reviews". It could also indicate whether there has been > > activity on the PR since the last time you interacted with it. > > > > Having now used the GitHub API to pull comments from PRs for the above > > analysis, there is certainly enough information available to help > > create this kind of tool. I'd be willing to contribute to building the > > backend of such a web application. > > > > This is just one idea, but I am curious to hear from others who are > > spending a lot of time doing code review / PR merging to see what > > might help them use their time more effectively. > > > > Thanks, > > Wes > >
Re: Improving PR workload management for Arrow maintainers
The thing that would make me more efficient reviewing PRs is figuring out which one of the open reviews are ready for additional feedback. I think the idea of a webapp or something that shows active reviews would be helpful (though I get most of that from appropriate email filters). What about a system involving labels (for which there is already a basic GUI in github)? Something low tech like (Waiting for Review) (Addressing Feedback) (Approved, waiting for Merge) With maybe some automation prompting people to add the "Waiting on Review" label when they want feedback Andrew On Tue, Jun 29, 2021 at 4:28 AM Wes McKinney wrote: > hi folks, > > I've noted that the volume of PRs for Arrow has been steadily > increasing (and will likely continue to increase), and while I've > personally had less time for development / maintenance / code reviews > over the last year, I would like to have a discussion about what we > could do to improve our tooling for maintainers to optimize the > efficiency of time spent tending to the PR queue. In my own > experience, I have felt that I have wasted a lot of time digging > around the queue looking for PRs that are awaiting feedback or need to > be merged. > > I note first of all that around 70 out of 173 open PRs have been > updated in the last 7 days, so while there is some PR staleness, to > have nearly half of the PRs active is pretty good. That said, ~70 > active PRs is a lot of PRs to tend to. > > I scraped the project's code review comment history, and here are the > individuals who have left the most comments on PRs since genesis > > pitrou6802 > wesm 5023 > emkornfield 3032 > bkietz2834 > kou 1489 > nealrichardson1439 > fsaintjacques 1356 > kszucs1250 > alamb 1133 > jorisvandenbossche1094 > liyafan82 831 > lidavidm 816 > westonpace 794 > xhochy 770 > nevi-me643 > BryanCutler639 > jorgecarleitao 635 > cpcloud551 > sunchao536 > ianmcook 499 > > Since we're probably stuck using GitHub to receive code contributions > (as opposed to systems — Gerrit is one I'm familiar with — that > provide more structure for reviewers to track the patches they "own" > as well as the outgoing/incoming state of reviews), I am wondering > what kinds of tools we could create to make it easier for maintainers > to keep track of PRs they are shepherding through the contribution > process. Ideally this wouldn't involve maintainers having to engage in > some explicit action like assigning themselves as a PR reviewer. > > Here's one idea: a web application that displays "your reviews", a > table of PRs that you have interacted with in any way (commented, left > code review, assigned as reviewer, someone mentioned you, etc.) sorted > either by last commit or last comment to assess "freshness". So if you > comment on a PR or leave a code review, it will automatically show up > in "your reviews". It could also indicate whether there has been > activity on the PR since the last time you interacted with it. > > Having now used the GitHub API to pull comments from PRs for the above > analysis, there is certainly enough information available to help > create this kind of tool. I'd be willing to contribute to building the > backend of such a web application. > > This is just one idea, but I am curious to hear from others who are > spending a lot of time doing code review / PR merging to see what > might help them use their time more effectively. > > Thanks, > Wes >
Improving PR workload management for Arrow maintainers
hi folks, I've noted that the volume of PRs for Arrow has been steadily increasing (and will likely continue to increase), and while I've personally had less time for development / maintenance / code reviews over the last year, I would like to have a discussion about what we could do to improve our tooling for maintainers to optimize the efficiency of time spent tending to the PR queue. In my own experience, I have felt that I have wasted a lot of time digging around the queue looking for PRs that are awaiting feedback or need to be merged. I note first of all that around 70 out of 173 open PRs have been updated in the last 7 days, so while there is some PR staleness, to have nearly half of the PRs active is pretty good. That said, ~70 active PRs is a lot of PRs to tend to. I scraped the project's code review comment history, and here are the individuals who have left the most comments on PRs since genesis pitrou6802 wesm 5023 emkornfield 3032 bkietz2834 kou 1489 nealrichardson1439 fsaintjacques 1356 kszucs1250 alamb 1133 jorisvandenbossche1094 liyafan82 831 lidavidm 816 westonpace 794 xhochy 770 nevi-me643 BryanCutler639 jorgecarleitao 635 cpcloud551 sunchao536 ianmcook 499 Since we're probably stuck using GitHub to receive code contributions (as opposed to systems — Gerrit is one I'm familiar with — that provide more structure for reviewers to track the patches they "own" as well as the outgoing/incoming state of reviews), I am wondering what kinds of tools we could create to make it easier for maintainers to keep track of PRs they are shepherding through the contribution process. Ideally this wouldn't involve maintainers having to engage in some explicit action like assigning themselves as a PR reviewer. Here's one idea: a web application that displays "your reviews", a table of PRs that you have interacted with in any way (commented, left code review, assigned as reviewer, someone mentioned you, etc.) sorted either by last commit or last comment to assess "freshness". So if you comment on a PR or leave a code review, it will automatically show up in "your reviews". It could also indicate whether there has been activity on the PR since the last time you interacted with it. Having now used the GitHub API to pull comments from PRs for the above analysis, there is certainly enough information available to help create this kind of tool. I'd be willing to contribute to building the backend of such a web application. This is just one idea, but I am curious to hear from others who are spending a lot of time doing code review / PR merging to see what might help them use their time more effectively. Thanks, Wes