Hi , I agree Venkata this issue is bigger than closing out stale prs. We can see that issues are being raised at a rate way above the resolution time. https://issues.apache.org/jira/secure/ConfigureReport.jspa?projectOrFilterId=project-12315522&periodName=daily&daysprevious=90&cumulative=true&versionLabels=major&selectedProjectId=12315522&reportKey=com.atlassian.jira.jira-core-reports-plugin%3Acreatedvsresolved-report&atl_token=A5KQ-2QAV-T4JA-FDED_19ff17decb93662bafa09e4b3ffb3a385c202015_lin&Next=Next Gaining over 500 issues to the backlog every 3 months.
We have over 1000 open prs. This is a lot of technical debt. I came across a 6 month old pr recently that had not been merged. A second Jira issue was raised for the same problem and a second pr fixed the issue (identically). The first pr was still on the backlog until we noticed it. I am looking to contribute to the community to be able to identify issues I can work on and then be reasonably certain they will be reviewed and merged so I can build on contributions. I have worked as a maintainer and committer in other communities and managed to spend some of the week addressing incoming work; I am happy to do this in some capacity with the support of committer(s) for Flink. It seems to me it is virtuous circle to enable more contributions, to get more committers , builds those committers that can help merge and review the backlog. Some thoughts ( I am new to this – so apologise if I have misunderstood something or am unaware of other existing mechanisms) : 1. If there is an issue that a committer has assigned to a contributor as per the process<https://flink.apache.org/how-to-contribute/contribute-code/> , and there is a pr then it should be with the committer to review the pr, or return it to the work queue. I do not know how many prs are like this. It seems to me that if a committer assigns an issue, they are indicating they will review, unassign themselves or merge. I do not think these prs should be closed as stale. 2. Could we have a Git action to notify committers (tagged in the pr?) if a pr (that has an assigned Jira) has not been reviewed in a certain period (7 days?) then subsequent nags if there has been no response . In this way busy committers can see that a pr needs looking at. 3. Other prs have been raised without a committer saying that they will fix it. In this case there is likely to be value, but the merging and review work has not been taken on by anyone. I notice spelling mistake prs that have not been merged (there are 8 with this query https://github.com/apache/flink/pulls?q=is%3Apr+is%3Aopen+spelling) , these are typical newbee prs as they are simple but useful improvements.; it would be great if these simpler ones could just be merged – maybe they should be marked as a [hotfix] to indicate they should be merged. If simpler prs are not merged – it is very difficult for new contributors to gain eminence to get towards being a committer. 4. There are also issues that have been raised by people who do not want to fix them. It seems to me that we need a “triaged” state to indicate the issue looks valid and reasonable, so could be picked up by someone – at which time they would need to agree with a committer to get the associated pr reviewed and merged. This triaged state would be a pool of issues that new contributors to choose from I am happy to help to improve – once we have consensus, Kind regards, David. From: Venkatakrishnan Sowrirajan <vsowr...@asu.edu> Date: Wednesday, 4 October 2023 at 00:36 To: dev@flink.apache.org <dev@flink.apache.org> Subject: [EXTERNAL] Re: Close orphaned/stale PRs Gentle ping to surface this up for more discussions. Regards Venkata krishnan On Tue, Sep 26, 2023 at 4:59 PM Venkatakrishnan Sowrirajan <vsowr...@asu.edu> wrote: > Hi Martijn, > > Agree with your point that closing a PR without any review feedback even > after 'X' days is discouraging to a new contributor. I understand that this > is a capacity problem. Capacity problem cannot be solved by this proposal > and it is beyond the scope of this proposal. > > Regarding your earlier question, > > What's the added value of > closing these PRs > > - Having lots of inactive PRs lingering around shows the project is > less active. I am not saying this is the only way to determine how active a > project is, but this is one of the key factors. > - A large number of PRs open can be discouraging for (new) > contributors but on the other hand I agree closing an inactive PR without > any reviews can also drive contributors away. > > Having said all of that, I agree closing PRs that don't have any reviews > to start with should be avoided from the final proposal. > > > I'm +1 for (automatically) closing up PRs after X days which: > a) Don't have a CI that has passed > b) Don't follow the code contribution guide (like commit naming > conventions) > c) Have changes requested but aren't being followed-up by the contributor > > In general, I'm largely +1 on your above proposal except for the > implementation feasibility. > > Also, I have picked a few other popular projects that have implemented the > Github's actions stale rule to see if we can borrow some ideas. Below > projects are listed in the order of the most invasive (for lack of a better > word) to the least invasive actions taken wrt PR without any updates for a > long period of time. > > 1. Trino > > TL;DR - No updates in the PR for the last 21 days, tag other maintainers > for review. If there are no updates for 21 days after that, close the PR > with this message - "*Closing this pull request, as it has been stale for > six weeks. Feel free to re-open at any time.*" > Trino's stale PR Github action rule (stale.yaml) > <https://github.com/trinodb/trino/blob/master/.github/workflows/stale.yml > > > > 2. Apache Spark > > TL;DR - No updates in the PR in the last 100 days, closing the PR with > this message - "*We're closing this PR because it hasn't been updated in > a while. This isn't a judgement on the merit of the PR in any way. It's > just a way of keeping the PR queue manageable. If you'd like to revive this > PR, please reopen it and ask a committer to remove the Stale tag!*" > Spark's discussion in their mailing list > <https://lists.apache.org/thread/yg3ggtvpt2dbjpnb2q0yblq30sc1g2yx > on > closing stale PRs. Spark's stale PR github action rule (stale.yaml > <https://github.com/apache/spark/blob/master/.github/workflows/stale.yml > > ). > > 3. Python > > TL;DR - No updates in the PR for the last 30 days, then tag the PR as > stale. Note: Python project *doesn't* close the stale PRs. > > Python discussion > <https://discuss.python.org/t/decision-needed-should-we-close-stale-prs-and-how-many-lapsed-days-are-prs-considered-stale/4637 > > > in the mailing list to close stale PRs. Python's stale PR github action > rule (stale.yaml > <https://github.com/python/cpython/blob/main/.github/workflows/stale.yml >) > > Few others Apache Beam > <https://github.com/apache/beam/blob/master/.github/workflows/stale.yml > > (closes > inactive PRs after 60+ days), Apache Airflow > <https://github.com/apache/airflow/blob/main/.github/workflows/stale.yml > > (closes > inactive PRs after 50 days) > > Let me know what you think. Looking forward to hearing from others in the > community and their experiences. > > [1] Github Action - Close Stale Issues - > https://github.com/marketplace/actions/close-stale-issues > > Regards > Venkata krishnan > > > On Thu, Sep 21, 2023 at 6:03 AM Martijn Visser <martijnvis...@apache.org> > wrote: > >> Hi all, >> >> I really believe that the problem of the number of open PRs is just >> that there aren't enough reviewers/resources available to review them. >> >> > Stale PRs can clutter the repository, and closing them helps keep it >> organized and ensures that only relevant and up-to-date PRs are present. >> >> Sure, but what's the indicator that the PR is stale? The fact that >> there has been no reviewer yet to review it, doesn't mean that the PR >> is stale. For me, a stale PR is a PR that has been reviewed, changes >> have been requested and the contributor isn't participating in the >> discussion anymore. But that's a different story compared to closing >> PRs where there has been no review done at all. >> >> > It mainly helps the project maintainers/reviewers to focus on only the >> actively updated trimmed list of PRs that are ready for review. >> >> I disagree that closing PRs helps with this. If you want to help >> maintainers/reviewers, we should have a situation where it's obvious >> that a PR is really ready (meaning, CI has passed, PR contents/commit >> message etc are following the code contribution guidelines). >> >> > It helps Flink users who are waiting on a PR that enhances an existing >> feature or fixes an issue a clear indication on whether the PR will be >> continually worked on and eventually get a closure or not and therefore >> will be closed. >> >> Having other PRs being closed doesn't increase the guarantee that >> other PRs will be reviewed. It's still a capacity problem. >> >> > It would be demotivating for any contributor when there is no feedback >> for a PR within a sufficient period of time anyway. >> >> Definitely. But I think it would be even worse if someone makes a >> contribution, there is no response but after X days they get a message >> that their PR was closed automatically. >> >> I'm +1 for (automatically) closing up PRs after X days which: >> a) Don't have a CI that has passed >> b) Don't follow the code contribution guide (like commit naming >> conventions) >> c) Have changes requested but aren't being followed-up by the contributor >> >> I'm -1 for automatically closing PRs where no maintainers have taken a >> review for the reasons I've listed above. >> >> Best regards, >> >> Martijn >> >> On Wed, Sep 20, 2023 at 7:41 AM Venkatakrishnan Sowrirajan >> <vsowr...@asu.edu> wrote: >> > >> > Thanks for your response, Martijn. >> > >> > > What's the added value of >> > closing these PRs >> > >> > It mainly helps the project maintainers/reviewers to focus on only the >> > actively updated trimmed list of PRs that are ready for review. >> > >> > It helps Flink users who are waiting on a PR that enhances an existing >> > feature or fixes an issue a clear indication on whether the PR will be >> > continually worked on and eventually get a closure or not and therefore >> > will be closed. >> > >> > Btw, I am open to other suggestions or enhancements on top of the >> proposal >> > as well. >> > >> > > it would >> > just close PRs where maintainers haven't been able to perform a >> > review, but getting a PR closed without any feedback is also >> > demotivating for a (potential new) contributor >> > >> > It would be demotivating for any contributor when there is no feedback >> for >> > a PR within a sufficient period of time anyway. I don't see closing the >> PR >> > which is inactive after a sufficient period of time (say 60 to 90 days) >> > would be any more discouraging than not getting any feedback. The >> problem >> > of not getting feedback due to not enough maintainer's bandwidth has to >> be >> > solved through other mechanisms. >> > >> > > I think the important >> > thing is that we get into a cycle where maintainers can see which PRs >> > are ready for review, and also a way to divide the bulk of the work. >> > >> > Yes, exactly my point as well. It helps the maintainers to see a trimmed >> > list which is ready to be reviewed. >> > >> > +1 for the other automation to nudge/help the contributor to fix the PR >> > that follows the contribution guide, CI checks passed etc. >> > >> > > IIRC we can't really fix that until we can >> > finally move to dedicated Github Action Runners instead of the current >> > setup with Azure, but that's primarily blocked by ASF Infra. >> > >> > Curious, if you can share the JIRA or prior discussion on this topic. I >> > would like to learn more about why Github Actions cannot be used for >> Apache >> > Flink. >> > >> > Regards >> > Venkata krishnan >> > >> > >> > On Tue, Sep 19, 2023 at 2:00 PM Martijn Visser < >> martijnvis...@apache.org> >> > wrote: >> > >> > > Hi Venkata, >> > > >> > > Thanks for opening the discussion, I've been thinking about it quite a >> > > bit but I'm not sure what's the right approach. >> > > >> > > From your proposal, the question would be "What's the added value of >> > > closing these PRs"? I don't see an immediate value of that: it would >> > > just close PRs where maintainers haven't been able to perform a >> > > review, but getting a PR closed without any feedback is also >> > > demotivating for a (potential new) contributor. I think the important >> > > thing is that we get into a cycle where maintainers can see which PRs >> > > are ready for review, and also a way to divide the bulk of the work. >> > > Because doing proper reviews requires time, and these resources are >> > > scarce. >> > > >> > > I do think that we can make lives a bit easier with some automation: >> > > * There are a lot of PRs which don't follow the contribution guide (No >> > > Jira ticket, no correct commit message etc). For the externalized >> > > connector repositories, we've been trying Boring Cyborg to provide >> > > information back to contributors if their PRs are as expected. If the >> > > PR doesn't follow the contribution guide, I'm included to give such a >> > > PR less attention review. That's primarily because there are other PRs >> > > out there that do follow these guides. >> > > * There are even more PRs where the CI has failed: in those cases, a >> > > review also makes less sense, given that the PR can't be merged as is. >> > > I do see that contributors sometimes don't know where to look for the >> > > status of the CI, but IIRC we can't really fix that until we can >> > > finally move to dedicated Github Action Runners instead of the current >> > > setup with Azure, but that's primarily blocked by ASF Infra. >> > > >> > > I'm curious what others in the community think. >> > > >> > > Best regards, >> > > >> > > Martijn >> > > >> > > On Tue, Sep 19, 2023 at 10:33 PM Venkatakrishnan Sowrirajan >> > > <vsowr...@asu.edu> wrote: >> > > > >> > > > Hi Flink devs, >> > > > >> > > > There are currently over 1,000 open pull requests >> > > > < >> > > >> https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc<https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc> >> > > > >> > > > (PRs) in the Apache Flink repository, with only 162 having been >> updated >> > > in >> > > > the last two months >> > > > < >> > > >> https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc+updated:>2023-07-19<https://github.com/apache/flink/pulls?q=is:open+is:pr+sort:updated-asc+updated:>2023-07-19> >> > > >. >> > > > This means that more than 85% of the PRs are stale and have not been >> > > > touched. >> > > > >> > > > I suggest setting up Github actions to monitor these stale PRs, and >> > > > automatically closing them if they have not been updated in the >> last 'x' >> > > > days. Authors would still be able to reopen the closed PRs if they >> > > continue >> > > > with their work. This would help to keep the PR queue manageable. >> > > > >> > > > Not sure if this has been discussed in the Apache Flink community >> before. >> > > > >> > > > Thoughts? >> > > > >> > > > Regards >> > > > Venkata krishnan >> > > >> > Unless otherwise stated above: IBM United Kingdom Limited Registered in England and Wales with number 741598 Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU