Hi Ryan,

I agree that good communication is key to determining what can be worked on.

In terms of metrics , we can use the gh cli to list prs and we can export 
issues from Jira. A view across them, you could join on the Flink issue (at the 
start of the pr comment and the flink issue itself – you could then see which 
prs have an assigned Jira would be expected to be reviewed. There is no 
explicit reviewer field in the Jira issue; I am not sure if we can easily get 
this info without having a custom field (which others have tried).

In terms of what prs a committer could / should review – I would think that 
component ownership helps scope the subset of prs to review / merge.

                Kind regards, David.


From: Ryan Skraba <ryan.skr...@aiven.io.INVALID>
Date: Wednesday, 4 October 2023 at 15:09
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: [EXTERNAL] Re: FW: RE: Close orphaned/stale PRs
Hey, this has been an interesting discussion -- this is something that
has been on my mind as an open source contributor and committer (I'm
not a Flink committer).

A large number of open PRs doesn't _necessarily_ mean a project is
unhealthy or has technical debt. If it's fun and easy to get your
contribution accepted and committed, even for a small fix, you're more
likely to raise another PR, and another.  I wouldn't be surprised if
there's a natural equilibrium where adding capacity to smoothly review
and manage more PRs cause more PRs to be submitted.  Everyone wins!

I don't think there's a measure for the "average PR lifetime", or
"time to first comment", but those would be more interesting things to
know and those are the worrisome ones.

As a contributor, I'm pretty willing to wait as long as necessary (and
rebase and fix merge conflicts) if there's good communication in
place. I'm pretty patient, especially if I knew that the PR would be
looked at and merged for a specific fix version (for example).  I'd
expect simple and obvious fixes with limited scope to take less time
than a more complex, far-reaching change.  I'd probably appreciate
that the boring-cyborg welcomes me on my first PR, but I'd be pretty
irritated if any PR were closed without any human interaction.

As a reviewer or committer, it's just overwhelming to see the big
GitHub list, and sometimes it feels random just "picking one near the
top" to look at.  In projects where I have the committer role, I
sometimes feel more badly about work I'm *not* doing than the work I'm
getting done! This isn't sustainable either.  A lot of people on the
project are volunteering after hours, and grooming, reviewing and
commenting PRs shouldn't be a thankless, unending job to feel bad
about.

As a contributor, one "magic" solution that I'd love to see is a
better UI that could show (for example) tentative "review dates", like
the number at a butcher shop, and proposed reviewers.

If I was committing to reviewing a PR every day, it would be great if
I could know which ones were the best "next" candidates to review: the
one waiting longest, or a new, critical fix in my domain.  As it
stands, there's next to no chance that the PRs in the middle of the
list are going to get any attention, but closing them stand to lose
valuable work or (worse) turn off a potential contributor forever.

Taking a look at some open PRs that I authored or interacted with: I
found one that should have been closed, one that was waiting for MY
attention for a merge-squash-rebase (oops), another where I made some
requested changes and it's back in review limbo.  Unfortunately, I
don't think any of these would have been brought to my attention by a
nag-bot. I don't think I'm alone; automated emails get far less
attention  with sometime not giving automated emails much attention.

OK, one more thing to think about: some underrepresented groups in
tech can find it difficult to demand attention, through constant
pinging and commenting and reminding...  So finding ways to make sure
that non-squeaky wheels also get some love is a really fair goal.

There's some pretty good ideas in this conversation, and I'm really
glad to hear it being brought up!  I'd love to hear any other
brainstorming for ideas, and get the virtual circle that David
mentioned!

All my best, Ryan







On Wed, Oct 4, 2023 at 12:03 PM David Radley <david_rad...@uk.ibm.com> wrote:
>
> Hi,
> To add I agree with Martijn’s insights; I think we are saying similar things. 
> To progress agreed upon work, and not blanket close all stale prs,
>       Kind regards, David.
>
> From: David Radley <david_rad...@uk.ibm.com>
> Date: Wednesday, 4 October 2023 at 10:59
> To: dev@flink.apache.org <dev@flink.apache.org>
> Subject: [EXTERNAL] RE: Close orphaned/stale PRs
> Hi ,
> I agree Venkata this issue is bigger than closing out stale prs.
>
> We can see that issues are being raised at a rate way above the resolution 
> time. 
> https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12315522&periodName=daily&daysprevious=90&cumulative=true&versionLabels=major&selectedProjectId=12315522&reportKey=com.atlassian.jira.jira-core-reports-plugin%3Acreatedvsresolved-report&atl_token=A5KQ-2QAV-T4JA-FDED_19ff17decb93662bafa09e4b3ffb3a385c202015_lin&Next=Next
> Gaining over 500 issues to the backlog every 3 months.
>
> We have over 1000 open prs. This is a lot of technical debt. I came across a 
> 6 month old pr recently that had not been merged. A second Jira issue was 
> raised  for the same problem and a second pr fixed the issue (identically). 
> The first pr was still on the backlog until we noticed it.
>
> I am looking to contribute to the community to be able to identify issues I 
> can work on and then be reasonably certain they will be reviewed and merged 
> so I can build on contributions. I have worked as a maintainer and committer 
> in other communities and managed to spend some of the week addressing 
> incoming work; I am happy to do this in some capacity with the support of 
> committer(s) for Flink.  It seems to me it is virtuous circle to enable more 
> contributions, to get more committers , builds those committers that can help 
> merge and review the backlog.
>
> Some thoughts ( I am new to this – so apologise if I have misunderstood 
> something or am unaware of other existing mechanisms) :
>
>   1.  If there is an issue that a committer has assigned to a contributor as 
> per the process<https://flink.apache.org/how-to-contribute/contribute-code/  
> > , and there is a pr then it should be with the committer to review the pr, 
> or return it to the work queue. I do not know how many prs are like this. It 
> seems to me that if a committer assigns an issue, they are indicating they 
> will review, unassign themselves or merge. I do not think these prs should be 
> closed as stale.
>   2.  Could we have a Git action to notify committers (tagged in the pr?) if 
> a pr (that has an assigned Jira)  has not been reviewed in a certain period 
> (7 days?) then subsequent nags if there has been no response . In this way 
> busy committers can see that a pr needs looking at.
>   3.  Other prs have been raised without a committer saying that they will 
> fix it.  In this case there is likely to be value, but the merging and review 
> work has not been taken on by anyone. I notice spelling mistake prs that have 
> not been merged (there are 8 with this query 
> https://github.com/apache/flink/pulls?q=is%3Apr+is%3Aopen+spelling  ) , these 
> are typical newbee prs as they are simple but useful improvements.; it would 
> be great if these simpler ones could just be merged – maybe they should be 
> marked as a [hotfix] to indicate they should be merged.  If simpler prs are 
> not merged – it is very difficult for new contributors to gain eminence to 
> get towards being a committer.
>   4.  There are also issues that have been raised by people who do not want 
> to fix them. It seems to me that we need a “triaged” state to indicate the 
> issue looks valid and reasonable, so could be picked up by someone – at which 
> time they would need to agree with a committer to get the associated pr 
> reviewed and merged. This triaged state would be a pool of issues that new 
> contributors to choose from
>
>
>
> I am happy to help to improve – once we have consensus,
>
>
>
> Kind regards, David.
>
>
>
>
> From: Venkatakrishnan Sowrirajan <vsowr...@asu.edu>
> Date: Wednesday, 4 October 2023 at 00:36
> To: dev@flink.apache.org <dev@flink.apache.org>
> Subject: [EXTERNAL] Re: Close orphaned/stale PRs
> Gentle ping to surface this up for more discussions.
>
> Regards
> Venkata krishnan
>
>
> On Tue, Sep 26, 2023 at 4:59 PM Venkatakrishnan Sowrirajan <vsowr...@asu.edu>
> wrote:
>
> > Hi Martijn,
> >
> > Agree with your point that closing a PR without any review feedback even
> > after 'X' days is discouraging to a new contributor. I understand that this
> > is a capacity problem. Capacity problem cannot be solved by this proposal
> > and it is beyond the scope of this proposal.
> >
> > Regarding your earlier question,
> > > What's the added value of
> > closing these PRs
> >
> >    - Having lots of inactive PRs lingering around shows the project is
> >    less active. I am not saying this is the only way to determine how 
> > active a
> >    project is, but this is one of the key factors.
> >    - A large number of PRs open can be discouraging for (new)
> >    contributors but on the other hand I agree closing an inactive PR without
> >    any reviews can also drive contributors away.
> >
> > Having said all of that, I agree closing PRs that don't have any reviews
> > to start with should be avoided from the final proposal.
> >
> > > I'm +1 for (automatically) closing up PRs after X days which:
> > a) Don't have a CI that has passed
> > b) Don't follow the code contribution guide (like commit naming
> > conventions)
> > c) Have changes requested but aren't being followed-up by the contributor
> >
> > In general, I'm largely +1 on your above proposal except for the
> > implementation feasibility.
> >
> > Also, I have picked a few other popular projects that have implemented the
> > Github's actions stale rule to see if we can borrow some ideas. Below
> > projects are listed in the order of the most invasive (for lack of a better
> > word) to the least invasive actions taken wrt PR without any updates for a
> > long period of time.
> >
> > 1. Trino
> >
> > TL;DR - No updates in the PR for the last 21 days, tag other maintainers
> > for review. If there are no updates for 21 days after that, close the PR
> > with this message - "*Closing this pull request, as it has been stale for
> > six weeks. Feel free to re-open at any time.*"
> > Trino's stale PR Github action rule (stale.yaml)
> > <https://github.com/trinodb/trino/blob/master/.github/workflows/stale.yml   
> > >
> >
> >
> > 2. Apache Spark
> >
> > TL;DR - No updates in the PR in the last 100 days, closing the PR with
> > this message - "*We're closing this PR because it hasn't been updated in
> > a while. This isn't a judgement on the merit of the PR in any way. It's
> > just a way of keeping the PR queue manageable. If you'd like to revive this
> > PR, please reopen it and ask a committer to remove the Stale tag!*"
> > Spark's discussion in their mailing list
> > <https://lists.apache.org/thread/yg3ggtvpt2dbjpnb2q0yblq30sc1g2yx   > on
> > closing stale PRs. Spark's stale PR github action rule (stale.yaml
> > <https://github.com/apache/spark/blob/master/.github/workflows/stale.yml   >
> > ).
> >
> > 3. Python
> >
> > TL;DR - No updates in the PR for the last 30 days, then tag the PR as
> > stale. Note: Python project *doesn't* close the stale PRs.
> >
> > Python discussion
> > <https://discuss.python.org/t/decision-needed-should-we-close-stale-prs-and-how-many-lapsed-days-are-prs-considered-stale/4637
> >    >
> > in the mailing list to close stale PRs. Python's stale PR github action
> > rule (stale.yaml
> > <https://github.com/python/cpython/blob/main/.github/workflows/stale.yml   
> > >)
> >
> > Few others Apache Beam
> > <https://github.com/apache/beam/blob/master/.github/workflows/stale.yml   > 
> > (closes
> > inactive PRs after 60+ days), Apache Airflow
> > <https://github.com/apache/airflow/blob/main/.github/workflows/stale.yml   
> > > (closes
> > inactive PRs after 50 days)
> >
> > Let me know what you think. Looking forward to hearing from others in the
> > community and their experiences.
> >
> > [1] Github Action - Close Stale Issues -
> > https://github.com/marketplace/actions/close-stale-issues
> >
> > Regards
> > Venkata krishnan
> >
> >
> > On Thu, Sep 21, 2023 at 6:03 AM Martijn Visser <martijnvis...@apache.org>
> > wrote:
> >
> >> Hi all,
> >>
> >> I really believe that the problem of the number of open PRs is just
> >> that there aren't enough reviewers/resources available to review them.
> >>
> >> > Stale PRs can clutter the repository, and closing them helps keep it
> >> organized and ensures that only relevant and up-to-date PRs are present.
> >>
> >> Sure, but what's the indicator that the PR is stale? The fact that
> >> there has been no reviewer yet to review it, doesn't mean that the PR
> >> is stale. For me, a stale PR is a PR that has been reviewed, changes
> >> have been requested and the contributor isn't participating in the
> >> discussion anymore. But that's a different story compared to closing
> >> PRs where there has been no review done at all.
> >>
> >> > It mainly helps the project maintainers/reviewers to focus on only the
> >> actively updated trimmed list of PRs that are ready for review.
> >>
> >> I disagree that closing PRs helps with this. If you want to help
> >> maintainers/reviewers, we should have a situation where it's obvious
> >> that a PR is really ready (meaning, CI has passed, PR contents/commit
> >> message etc are following the code contribution guidelines).
> >>
> >> > It helps Flink users who are waiting on a PR that enhances an existing
> >> feature or fixes an issue a clear indication on whether the PR will be
> >> continually worked on and eventually get a closure or not and therefore
> >> will be closed.
> >>
> >> Having other PRs being closed doesn't increase the guarantee that
> >> other PRs will be reviewed. It's still a capacity problem.
> >>
> >> > It would be demotivating for any contributor when there is no feedback
> >> for a PR within a sufficient period of time anyway.
> >>
> >> Definitely. But I think it would be even worse if someone makes a
> >> contribution, there is no response but after X days they get a message
> >> that their PR was closed automatically.
> >>
> >> I'm +1 for (automatically) closing up PRs after X days which:
> >> a) Don't have a CI that has passed
> >> b) Don't follow the code contribution guide (like commit naming
> >> conventions)
> >> c) Have changes requested but aren't being followed-up by the contributor
> >>
> >> I'm -1 for automatically closing PRs where no maintainers have taken a
> >> review for the reasons I've listed above.
> >>
> >> Best regards,
> >>
> >> Martijn
> >>
> >> On Wed, Sep 20, 2023 at 7:41 AM Venkatakrishnan Sowrirajan
> >> <vsowr...@asu.edu> wrote:
> >> >
> >> > Thanks for your response, Martijn.
> >> >
> >> > > What's the added value of
> >> > closing these PRs
> >> >
> >> > It mainly helps the project maintainers/reviewers to focus on only the
> >> > actively updated trimmed list of PRs that are ready for review.
> >> >
> >> > It helps Flink users who are waiting on a PR that enhances an existing
> >> > feature or fixes an issue a clear indication on whether the PR will be
> >> > continually worked on and eventually get a closure or not and therefore
> >> > will be closed.
> >> >
> >> > Btw, I am open to other suggestions or enhancements on top of the
> >> proposal
> >> > as well.
> >> >
> >> > > it would
> >> > just close PRs where maintainers haven't been able to perform a
> >> > review, but getting a PR closed without any feedback is also
> >> > demotivating for a (potential new) contributor
> >> >
> >> > It would be demotivating for any contributor when there is no feedback
> >> for
> >> > a PR within a sufficient period of time anyway. I don't see closing the
> >> PR
> >> > which is inactive after a sufficient period of time (say 60 to 90 days)
> >> > would be any more discouraging than not getting any feedback. The
> >> problem
> >> > of not getting feedback due to not enough maintainer's bandwidth has to
> >> be
> >> > solved through other mechanisms.
> >> >
> >> > > I think the important
> >> > thing is that we get into a cycle where maintainers can see which PRs
> >> > are ready for review, and also a way to divide the bulk of the work.
> >> >
> >> > Yes, exactly my point as well. It helps the maintainers to see a trimmed
> >> > list which is ready to be reviewed.
> >> >
> >> > +1 for the other automation to nudge/help the contributor to fix the PR
> >> > that follows the contribution guide, CI checks passed etc.
> >> >
> >> > > IIRC we can't really fix that until we can
> >> > finally move to dedicated Github Action Runners instead of the current
> >> > setup with Azure, but that's primarily blocked by ASF Infra.
> >> >
> >> > Curious, if you can share the JIRA or prior discussion on this topic. I
> >> > would like to learn more about why Github Actions cannot be used for
> >> Apache
> >> > Flink.
> >> >
> >> > Regards
> >> > Venkata krishnan
> >> >
> >> >
> >> > On Tue, Sep 19, 2023 at 2:00 PM Martijn Visser <
> >> martijnvis...@apache.org>
> >> > wrote:
> >> >
> >> > > Hi Venkata,
> >> > >
> >> > > Thanks for opening the discussion, I've been thinking about it quite a
> >> > > bit but I'm not sure what's the right approach.
> >> > >
> >> > > From your proposal, the question would be "What's the added value of
> >> > > closing these PRs"? I don't see an immediate value of that: it would
> >> > > just close PRs where maintainers haven't been able to perform a
> >> > > review, but getting a PR closed without any feedback is also
> >> > > demotivating for a (potential new) contributor. I think the important
> >> > > thing is that we get into a cycle where maintainers can see which PRs
> >> > > are ready for review, and also a way to divide the bulk of the work.
> >> > > Because doing proper reviews requires time, and these resources are
> >> > > scarce.
> >> > >
> >> > > I do think that we can make lives a bit easier with some automation:
> >> > > * There are a lot of PRs which don't follow the contribution guide (No
> >> > > Jira ticket, no correct commit message etc). For the externalized
> >> > > connector repositories, we've been trying Boring Cyborg to provide
> >> > > information back to contributors if their PRs are as expected. If the
> >> > > PR doesn't follow the contribution guide, I'm included to give such a
> >> > > PR less attention review. That's primarily because there are other PRs
> >> > > out there that do follow these guides.
> >> > > * There are even more PRs where the CI has failed: in those cases, a
> >> > > review also makes less sense, given that the PR can't be merged as is.
> >> > > I do see that contributors sometimes don't know where to look for the
> >> > > status of the CI, but IIRC we can't really fix that until we can
> >> > > finally move to dedicated Github Action Runners instead of the current
> >> > > setup with Azure, but that's primarily blocked by ASF Infra.
> >> > >
> >> > > I'm curious what others in the community think.
> >> > >
> >> > > Best regards,
> >> > >
> >> > > Martijn
> >> > >
> >> > > On Tue, Sep 19, 2023 at 10:33 PM Venkatakrishnan Sowrirajan
> >> > > <vsowr...@asu.edu> wrote:
> >> > > >
> >> > > > Hi Flink devs,
> >> > > >
> >> > > > There are currently over 1,000 open pull requests
> >> > > > <
> >> > >
> >> https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc  
> >> <https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc  >
> >> > > >
> >> > > > (PRs) in the Apache Flink repository, with only 162 having been
> >> updated
> >> > > in
> >> > > > the last two months
> >> > > > <
> >> > >
> >> https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc+updated
> >>   
> >> :>2023-07-19<https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc+updated
> >>   :>2023-07-19>
> >> > > >.
> >> > > > This means that more than 85% of the PRs are stale and have not been
> >> > > > touched.
> >> > > >
> >> > > > I suggest setting up Github actions to monitor these stale PRs, and
> >> > > > automatically closing them if they have not been updated in the
> >> last 'x'
> >> > > > days. Authors would still be able to reopen the closed PRs if they
> >> > > continue
> >> > > > with their work. This would help to keep the PR queue manageable.
> >> > > >
> >> > > > Not sure if this has been discussed in the Apache Flink community
> >> before.
> >> > > >
> >> > > > Thoughts?
> >> > > >
> >> > > > Regards
> >> > > > Venkata krishnan
> >> > >
> >>
> >
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Reply via email to