Thanks Lari! Does this issue cause the tests for PRs like https://github.com/apache/pulsar/pull/17198 to be hang?
On 2022/09/06 14:41:07 Dave Fisher wrote: > We are going to need to take actions to fix our problems. See > https://issues.apache.org/jira/browse/INFRA-23633?focusedCommentId=17600749&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17600749 > > Jarek has done a large amount of GitHub Action work with Apache Airflow and > his suggestions might be helpful. One of his suggestions was Apache Yetus. I > think he means using the Maven plugins - > https://yetus.apache.org/documentation/0.14.0/yetus-maven-plugin/ > > > > On Sep 6, 2022, at 4:48 AM, Lari Hotari <lhot...@apache.org> wrote: > > > > The Apache Infra ticket is > > https://issues.apache.org/jira/browse/INFRA-23633 . > > > > -Lari > > > > On 2022/09/06 11:36:46 Lari Hotari wrote: > >> I asked for an update on the Apache org GitHub Actions usage stats from > >> Gavin McDonald on the-asf slack in this thread: > >> https://the-asf.slack.com/archives/CBX4TSBQ8/p1662464113873539?thread_ts=1661512133.913279&cid=CBX4TSBQ8 > >> . > >> > >> I hope we get this issue resolved since it delays PR processing a lot. > >> > >> -Lari > >> > >> On 2022/09/06 11:16:07 Lari Hotari wrote: > >>> Pulsar CI continues to be congested, and the build queue [1] is very long > >>> at the moment. There are 147 build jobs in the queue and 16 jobs in > >>> progress at the moment. > >>> > >>> I would strongly advice everyone to use "personal CI" to mitigate the > >>> issue of the long delay of CI feedback. You can simply open a PR to your > >>> own personal fork of apache/pulsar to run the builds in your "personal > >>> CI". There's more details in the previous emails in this thread. > >>> > >>> -Lari > >>> > >>> [1] - build queue: > >>> https://github.com/apache/pulsar/actions?query=is%3Aqueued > >>> > >>> On 2022/08/30 12:39:19 Lari Hotari wrote: > >>>> Pulsar CI continues to be congested, and the build queue is long. > >>>> > >>>> I would strongly advice everyone to use "personal CI" to mitigate the > >>>> issue of the long delay of CI feedback. You can simply open a PR to your > >>>> own personal fork of apache/pulsar to run the builds in your "personal > >>>> CI". There's more details in the previous email in this thread. > >>>> > >>>> Some updates: > >>>> > >>>> There has been a discussion with Gavin McDonald from ASF infra on > >>>> the-asf slack about getting usage reports from GitHub to support the > >>>> investigation. Slack thread is the same one mentioned in the previous > >>>> email, https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . > >>>> Gavin already requested the usage report in GitHub UI, but it produced > >>>> invalid results. > >>>> > >>>> I made a change to mitigate a source of additional GitHub Actions > >>>> overhead. > >>>> In the past, each cherry-picked commit to a maintenance branch of Pulsar > >>>> has triggered a lot of workflow runs. > >>>> > >>>> The solution for cancelling duplicate builds automatically is to add > >>>> this definition to the workflow definition: > >>>> concurrency: > >>>> group: ${{ github.workflow }}-${{ github.ref }} > >>>> cancel-in-progress: true > >>>> > >>>> I added this to all maintenance branch GitHub Actions workflows: > >>>> > >>>> branch-2.10 change: > >>>> https://github.com/apache/pulsar/commit/5d2c9851f4f4d70bfe74b1e683a41c5a040a6ca7 > >>>> branch-2.9 change: > >>>> https://github.com/apache/pulsar/commit/3ea124924fecf636cc105de75c62b3a99050847b > >>>> branch-2.8 change: > >>>> https://github.com/apache/pulsar/commit/48187bb5d95e581f8322a019b61d986e18a31e54 > >>>> branch-2.7: > >>>> https://github.com/apache/pulsar/commit/744b62c99344724eacdbe97c881311869d67f630 > >>>> > >>>> branch-2.11 already contains the necessary config for cancelling > >>>> duplicate builds. > >>>> > >>>> The benefit of the above change is that when multiple commits are > >>>> cherry-picked to a branch at once, only the build of the last commit > >>>> will get run eventually. The builds for the intermediate commits will > >>>> get cancelled. Obviously there's a tradeoff here that we don't get the > >>>> information if one of the earlier commits breaks the build. It's the > >>>> cost that we need to pay. Nevertheless our build is so flaky that it's > >>>> hard to determine whether a failed build result is only caused by bad > >>>> flaky test or whether it's an actual failure. Because of this we don't > >>>> lose anything by cancelling builds. It's more important to save build > >>>> resources. In the maintenance branches for 2.10 and older, the average > >>>> total build time consumed is around 20 hours which is a lot. > >>>> > >>>> At this time, the overhead of maintenance branch builds doesn't seem to > >>>> be the source of the problems. There must be some other issue which is > >>>> possibly related to exceeding a usage quota. Hopefully we get the CI > >>>> slowness issue solved asap. > >>>> > >>>> BR, > >>>> > >>>> Lari > >>>> > >>>> > >>>> On 2022/08/26 12:00:20 Lari Hotari wrote: > >>>>> Hi, > >>>>> > >>>>> GitHub Actions builds have been piling up in the build queue in the > >>>>> last few days. > >>>>> I posted on bui...@apache.org > >>>>> https://lists.apache.org/thread/6lbqr0f6mqt9s8ggollp5kj2nv7rlo9s and > >>>>> created INFRA ticket https://issues.apache.org/jira/browse/INFRA-23633 > >>>>> about this issue. > >>>>> There's also a thread on the-asf slack, > >>>>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661512133913279 . > >>>>> > >>>>> It seems that our build queue is finally getting picked up, but it > >>>>> would be great to see if we hit quota and whether that is the cause of > >>>>> pauses. > >>>>> > >>>>> Another issue is that the master branch broke after merging 2 > >>>>> conflicting PRs. > >>>>> The fix is in https://github.com/apache/pulsar/pull/17300 . > >>>>> > >>>>> Merging PRs will be slow until we have these 2 problems solved and > >>>>> existing PRs rebased over the changes. Let's prioritize merging #17300 > >>>>> before pushing more changes. > >>>>> > >>>>> I'd like to point out that a good way to get build feedback before > >>>>> sending a PR, is to run builds on your personal GitHub Actions CI. The > >>>>> benefit of this is that it doesn't consume the shared quota and builds > >>>>> usually start instantly. > >>>>> There are instructions in the contributors guide about this. > >>>>> https://pulsar.apache.org/contributing/#ci-testing-in-your-fork > >>>>> You simply open PRs to your own fork of apache/pulsar to run builds on > >>>>> your personal GitHub Actions CI. > >>>>> > >>>>> BR, > >>>>> > >>>>> Lari > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> > >>> > >> > >