Hi,
To add I agree with Martijn’s insights; I think we are saying similar things. 
To progress agreed upon work, and not blanket close all stale prs,
      Kind regards, David.

From: David Radley <david_rad...@uk.ibm.com>
Date: Wednesday, 4 October 2023 at 10:59
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: [EXTERNAL] RE: Close orphaned/stale PRs
Hi ,
I agree Venkata this issue is bigger than closing out stale prs.

We can see that issues are being raised at a rate way above the resolution 
time. 
https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12315522&periodName=daily&daysprevious=90&cumulative=true&versionLabels=major&selectedProjectId=12315522&reportKey=com.atlassian.jira.jira-core-reports-plugin%3Acreatedvsresolved-report&atl_token=A5KQ-2QAV-T4JA-FDED_19ff17decb93662bafa09e4b3ffb3a385c202015_lin&Next=Next
Gaining over 500 issues to the backlog every 3 months.

We have over 1000 open prs. This is a lot of technical debt. I came across a 6 
month old pr recently that had not been merged. A second Jira issue was raised  
for the same problem and a second pr fixed the issue (identically). The first 
pr was still on the backlog until we noticed it.

I am looking to contribute to the community to be able to identify issues I can 
work on and then be reasonably certain they will be reviewed and merged so I 
can build on contributions. I have worked as a maintainer and committer in 
other communities and managed to spend some of the week addressing incoming 
work; I am happy to do this in some capacity with the support of committer(s) 
for Flink.  It seems to me it is virtuous circle to enable more contributions, 
to get more committers , builds those committers that can help merge and review 
the backlog.

Some thoughts ( I am new to this – so apologise if I have misunderstood 
something or am unaware of other existing mechanisms) :

  1.  If there is an issue that a committer has assigned to a contributor as 
per the process<https://flink.apache.org/how-to-contribute/contribute-code/ > , 
and there is a pr then it should be with the committer to review the pr, or 
return it to the work queue. I do not know how many prs are like this. It seems 
to me that if a committer assigns an issue, they are indicating they will 
review, unassign themselves or merge. I do not think these prs should be closed 
as stale.
  2.  Could we have a Git action to notify committers (tagged in the pr?) if a 
pr (that has an assigned Jira)  has not been reviewed in a certain period (7 
days?) then subsequent nags if there has been no response . In this way busy 
committers can see that a pr needs looking at.
  3.  Other prs have been raised without a committer saying that they will fix 
it.  In this case there is likely to be value, but the merging and review work 
has not been taken on by anyone. I notice spelling mistake prs that have not 
been merged (there are 8 with this query 
https://github.com/apache/flink/pulls?q=is%3Apr+is%3Aopen+spelling ) , these 
are typical newbee prs as they are simple but useful improvements.; it would be 
great if these simpler ones could just be merged – maybe they should be marked 
as a [hotfix] to indicate they should be merged.  If simpler prs are not merged 
– it is very difficult for new contributors to gain eminence to get towards 
being a committer.
  4.  There are also issues that have been raised by people who do not want to 
fix them. It seems to me that we need a “triaged” state to indicate the issue 
looks valid and reasonable, so could be picked up by someone – at which time 
they would need to agree with a committer to get the associated pr reviewed and 
merged. This triaged state would be a pool of issues that new contributors to 
choose from



I am happy to help to improve – once we have consensus,



Kind regards, David.




From: Venkatakrishnan Sowrirajan <vsowr...@asu.edu>
Date: Wednesday, 4 October 2023 at 00:36
To: dev@flink.apache.org <dev@flink.apache.org>
Subject: [EXTERNAL] Re: Close orphaned/stale PRs
Gentle ping to surface this up for more discussions.

Regards
Venkata krishnan


On Tue, Sep 26, 2023 at 4:59 PM Venkatakrishnan Sowrirajan <vsowr...@asu.edu>
wrote:

> Hi Martijn,
>
> Agree with your point that closing a PR without any review feedback even
> after 'X' days is discouraging to a new contributor. I understand that this
> is a capacity problem. Capacity problem cannot be solved by this proposal
> and it is beyond the scope of this proposal.
>
> Regarding your earlier question,
> > What's the added value of
> closing these PRs
>
>    - Having lots of inactive PRs lingering around shows the project is
>    less active. I am not saying this is the only way to determine how active a
>    project is, but this is one of the key factors.
>    - A large number of PRs open can be discouraging for (new)
>    contributors but on the other hand I agree closing an inactive PR without
>    any reviews can also drive contributors away.
>
> Having said all of that, I agree closing PRs that don't have any reviews
> to start with should be avoided from the final proposal.
>
> > I'm +1 for (automatically) closing up PRs after X days which:
> a) Don't have a CI that has passed
> b) Don't follow the code contribution guide (like commit naming
> conventions)
> c) Have changes requested but aren't being followed-up by the contributor
>
> In general, I'm largely +1 on your above proposal except for the
> implementation feasibility.
>
> Also, I have picked a few other popular projects that have implemented the
> Github's actions stale rule to see if we can borrow some ideas. Below
> projects are listed in the order of the most invasive (for lack of a better
> word) to the least invasive actions taken wrt PR without any updates for a
> long period of time.
>
> 1. Trino
>
> TL;DR - No updates in the PR for the last 21 days, tag other maintainers
> for review. If there are no updates for 21 days after that, close the PR
> with this message - "*Closing this pull request, as it has been stale for
> six weeks. Feel free to re-open at any time.*"
> Trino's stale PR Github action rule (stale.yaml)
> <https://github.com/trinodb/trino/blob/master/.github/workflows/stale.yml  >
>
>
> 2. Apache Spark
>
> TL;DR - No updates in the PR in the last 100 days, closing the PR with
> this message - "*We're closing this PR because it hasn't been updated in
> a while. This isn't a judgement on the merit of the PR in any way. It's
> just a way of keeping the PR queue manageable. If you'd like to revive this
> PR, please reopen it and ask a committer to remove the Stale tag!*"
> Spark's discussion in their mailing list
> <https://lists.apache.org/thread/yg3ggtvpt2dbjpnb2q0yblq30sc1g2yx  > on
> closing stale PRs. Spark's stale PR github action rule (stale.yaml
> <https://github.com/apache/spark/blob/master/.github/workflows/stale.yml  >
> ).
>
> 3. Python
>
> TL;DR - No updates in the PR for the last 30 days, then tag the PR as
> stale. Note: Python project *doesn't* close the stale PRs.
>
> Python discussion
> <https://discuss.python.org/t/decision-needed-should-we-close-stale-prs-and-how-many-lapsed-days-are-prs-considered-stale/4637
>   >
> in the mailing list to close stale PRs. Python's stale PR github action
> rule (stale.yaml
> <https://github.com/python/cpython/blob/main/.github/workflows/stale.yml  >)
>
> Few others Apache Beam
> <https://github.com/apache/beam/blob/master/.github/workflows/stale.yml  > 
> (closes
> inactive PRs after 60+ days), Apache Airflow
> <https://github.com/apache/airflow/blob/main/.github/workflows/stale.yml  > 
> (closes
> inactive PRs after 50 days)
>
> Let me know what you think. Looking forward to hearing from others in the
> community and their experiences.
>
> [1] Github Action - Close Stale Issues -
> https://github.com/marketplace/actions/close-stale-issues
>
> Regards
> Venkata krishnan
>
>
> On Thu, Sep 21, 2023 at 6:03 AM Martijn Visser <martijnvis...@apache.org>
> wrote:
>
>> Hi all,
>>
>> I really believe that the problem of the number of open PRs is just
>> that there aren't enough reviewers/resources available to review them.
>>
>> > Stale PRs can clutter the repository, and closing them helps keep it
>> organized and ensures that only relevant and up-to-date PRs are present.
>>
>> Sure, but what's the indicator that the PR is stale? The fact that
>> there has been no reviewer yet to review it, doesn't mean that the PR
>> is stale. For me, a stale PR is a PR that has been reviewed, changes
>> have been requested and the contributor isn't participating in the
>> discussion anymore. But that's a different story compared to closing
>> PRs where there has been no review done at all.
>>
>> > It mainly helps the project maintainers/reviewers to focus on only the
>> actively updated trimmed list of PRs that are ready for review.
>>
>> I disagree that closing PRs helps with this. If you want to help
>> maintainers/reviewers, we should have a situation where it's obvious
>> that a PR is really ready (meaning, CI has passed, PR contents/commit
>> message etc are following the code contribution guidelines).
>>
>> > It helps Flink users who are waiting on a PR that enhances an existing
>> feature or fixes an issue a clear indication on whether the PR will be
>> continually worked on and eventually get a closure or not and therefore
>> will be closed.
>>
>> Having other PRs being closed doesn't increase the guarantee that
>> other PRs will be reviewed. It's still a capacity problem.
>>
>> > It would be demotivating for any contributor when there is no feedback
>> for a PR within a sufficient period of time anyway.
>>
>> Definitely. But I think it would be even worse if someone makes a
>> contribution, there is no response but after X days they get a message
>> that their PR was closed automatically.
>>
>> I'm +1 for (automatically) closing up PRs after X days which:
>> a) Don't have a CI that has passed
>> b) Don't follow the code contribution guide (like commit naming
>> conventions)
>> c) Have changes requested but aren't being followed-up by the contributor
>>
>> I'm -1 for automatically closing PRs where no maintainers have taken a
>> review for the reasons I've listed above.
>>
>> Best regards,
>>
>> Martijn
>>
>> On Wed, Sep 20, 2023 at 7:41 AM Venkatakrishnan Sowrirajan
>> <vsowr...@asu.edu> wrote:
>> >
>> > Thanks for your response, Martijn.
>> >
>> > > What's the added value of
>> > closing these PRs
>> >
>> > It mainly helps the project maintainers/reviewers to focus on only the
>> > actively updated trimmed list of PRs that are ready for review.
>> >
>> > It helps Flink users who are waiting on a PR that enhances an existing
>> > feature or fixes an issue a clear indication on whether the PR will be
>> > continually worked on and eventually get a closure or not and therefore
>> > will be closed.
>> >
>> > Btw, I am open to other suggestions or enhancements on top of the
>> proposal
>> > as well.
>> >
>> > > it would
>> > just close PRs where maintainers haven't been able to perform a
>> > review, but getting a PR closed without any feedback is also
>> > demotivating for a (potential new) contributor
>> >
>> > It would be demotivating for any contributor when there is no feedback
>> for
>> > a PR within a sufficient period of time anyway. I don't see closing the
>> PR
>> > which is inactive after a sufficient period of time (say 60 to 90 days)
>> > would be any more discouraging than not getting any feedback. The
>> problem
>> > of not getting feedback due to not enough maintainer's bandwidth has to
>> be
>> > solved through other mechanisms.
>> >
>> > > I think the important
>> > thing is that we get into a cycle where maintainers can see which PRs
>> > are ready for review, and also a way to divide the bulk of the work.
>> >
>> > Yes, exactly my point as well. It helps the maintainers to see a trimmed
>> > list which is ready to be reviewed.
>> >
>> > +1 for the other automation to nudge/help the contributor to fix the PR
>> > that follows the contribution guide, CI checks passed etc.
>> >
>> > > IIRC we can't really fix that until we can
>> > finally move to dedicated Github Action Runners instead of the current
>> > setup with Azure, but that's primarily blocked by ASF Infra.
>> >
>> > Curious, if you can share the JIRA or prior discussion on this topic. I
>> > would like to learn more about why Github Actions cannot be used for
>> Apache
>> > Flink.
>> >
>> > Regards
>> > Venkata krishnan
>> >
>> >
>> > On Tue, Sep 19, 2023 at 2:00 PM Martijn Visser <
>> martijnvis...@apache.org>
>> > wrote:
>> >
>> > > Hi Venkata,
>> > >
>> > > Thanks for opening the discussion, I've been thinking about it quite a
>> > > bit but I'm not sure what's the right approach.
>> > >
>> > > From your proposal, the question would be "What's the added value of
>> > > closing these PRs"? I don't see an immediate value of that: it would
>> > > just close PRs where maintainers haven't been able to perform a
>> > > review, but getting a PR closed without any feedback is also
>> > > demotivating for a (potential new) contributor. I think the important
>> > > thing is that we get into a cycle where maintainers can see which PRs
>> > > are ready for review, and also a way to divide the bulk of the work.
>> > > Because doing proper reviews requires time, and these resources are
>> > > scarce.
>> > >
>> > > I do think that we can make lives a bit easier with some automation:
>> > > * There are a lot of PRs which don't follow the contribution guide (No
>> > > Jira ticket, no correct commit message etc). For the externalized
>> > > connector repositories, we've been trying Boring Cyborg to provide
>> > > information back to contributors if their PRs are as expected. If the
>> > > PR doesn't follow the contribution guide, I'm included to give such a
>> > > PR less attention review. That's primarily because there are other PRs
>> > > out there that do follow these guides.
>> > > * There are even more PRs where the CI has failed: in those cases, a
>> > > review also makes less sense, given that the PR can't be merged as is.
>> > > I do see that contributors sometimes don't know where to look for the
>> > > status of the CI, but IIRC we can't really fix that until we can
>> > > finally move to dedicated Github Action Runners instead of the current
>> > > setup with Azure, but that's primarily blocked by ASF Infra.
>> > >
>> > > I'm curious what others in the community think.
>> > >
>> > > Best regards,
>> > >
>> > > Martijn
>> > >
>> > > On Tue, Sep 19, 2023 at 10:33 PM Venkatakrishnan Sowrirajan
>> > > <vsowr...@asu.edu> wrote:
>> > > >
>> > > > Hi Flink devs,
>> > > >
>> > > > There are currently over 1,000 open pull requests
>> > > > <
>> > >
>> https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc 
>> <https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc >
>> > > >
>> > > > (PRs) in the Apache Flink repository, with only 162 having been
>> updated
>> > > in
>> > > > the last two months
>> > > > <
>> > >
>> https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc+updated
>>  
>> :>2023-07-19<https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc+updated
>>  :>2023-07-19>
>> > > >.
>> > > > This means that more than 85% of the PRs are stale and have not been
>> > > > touched.
>> > > >
>> > > > I suggest setting up Github actions to monitor these stale PRs, and
>> > > > automatically closing them if they have not been updated in the
>> last 'x'
>> > > > days. Authors would still be able to reopen the closed PRs if they
>> > > continue
>> > > > with their work. This would help to keep the PR queue manageable.
>> > > >
>> > > > Not sure if this has been discussed in the Apache Flink community
>> before.
>> > > >
>> > > > Thoughts?
>> > > >
>> > > > Regards
>> > > > Venkata krishnan
>> > >
>>
>

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Unless otherwise stated above:

IBM United Kingdom Limited
Registered in England and Wales with number 741598
Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU

Reply via email to