Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Lari Hotari Thu, 15 Sep 2022 04:43:17 -0700

> I’m a little confused about what will CI do in this case? I think the 
> “ready-to-test” label should
> be removed in this case because the new code might not pass the tests. I 
> thought the author
> should request committers to add this label again after the tests passed in 
> his own repo.


This is possible. I'd rather postpone it since it would require more
effort to implement and perhaps just be an unnecessary burden.

The reviewer shouldn't be doing "/pulsarbot ready-to-test" for PRs
that aren't ready.
The PR author can continue to iterate the test runs in the fork until
PR comments have been addressed.
It's technically possible to have the same branch in PRs in
apache/pulsar and your own fork. When the branch gets updated, the
builds in the forked repository would run. That's why I think it's
better to keep on iterating on the PR until it's really ready for
final
testing in apache/pulsar repository.

-Lari

On Thu, Sep 15, 2022 at 2:27 PM Yunze Xu <y...@streamnative.io.invalid> wrote:
>
> > If there are later changes in the PR after the "ready-to-test" label has 
> > been added, we could simply let the Pulsar CI handle the builds.
>
> I’m a little confused about what will CI do in this case? I think the 
> “ready-to-test” label should
> be removed in this case because the new code might not pass the tests. I 
> thought the author
> should request committers to add this label again after the tests passed in 
> his own repo.
>
> Thanks,
> Yunze
>
>
>
>
> > On Sep 15, 2022, at 19:20, Lari Hotari <lhot...@apache.org> wrote:
> >
> >> In short, IIUC, each contributor should:
> >> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork 
> >> to
> >> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache 
> >> repo
> >>
> >> Then a committer should run `/pulsarbot ready-to-test` after the PR in
> >> contributor's private repo passed all tests, right?
> >
> > Exactly. One small detail: It should be the PR author's responsibility to 
> > follow up and request for a review and an approval after the tests pass.
> > If there are later changes in the PR after the "ready-to-test" label has 
> > been added, we could simply let the Pulsar CI handle the builds.
> >
> > -Lari
> >
> > On 2022/09/15 10:11:53 Yunze Xu wrote:
> >> Hi Lari,
> >>
> >> This proposal LGTM. But I have some questions about the details.
> >>
> >> In short, IIUC, each contributor should:
> >> 1. Follow https://pulsar.apache.org/contributing/#ci-testing-in-your-fork 
> >> to
> >> 2. Paste the link of the same PR in contributor’s fork to the PR in Apache 
> >> repo
> >>
> >> Then a committer should run `/pulsarbot ready-to-test` after the PR in
> >> contributor's private repo passed all tests, right?
> >>
> >> Thanks,
> >> Yunze
> >>
> >>
> >>
> >>
> >>> On Sep 15, 2022, at 17:54, Lari Hotari <lhot...@apache.org> wrote:
> >>>
> >>> Thanks for the comment.
> >>>
> >>> The question isn't about trusting PRs.
> >>> The CI resource consumption problem is also caused by current committer 
> >>> PRs. That's why it
> >>> is necessary to handle all PRs in the same way.
> >>> The benefit of the proposed solution is that we could decide to run some 
> >>> light checks automatically before the step that requires the 
> >>> "ready-to-check" label.
> >>>
> >>> After I sent the proposal, I found out that the Pulsar committer 
> >>> information including the GitHub
> >>> user names is available in JSON format at 
> >>> https://whimsy.apache.org/roster/committee/pulsar.json .
> >>> Pulsarbot can use this information for authorizing users who have access 
> >>> to
> >>> "/pulsarbot ready-to-test".
> >>>
> >>> I agree that we can skip adding a separate reviewer role, let's
> >>> simply use https://whimsy.apache.org/roster/committee/pulsar.json as the 
> >>> source of truth
> >>> for authorization.
> >>>
> >>> -Lari
> >>>
> >>> On 2022/09/15 09:22:18 tison wrote:
> >>>> Hi Lari,
> >>>>
> >>>> Thanks for starting this discussion. The overall proposal looks good and
> >>>> it's really great that you can spend some time on such a significant
> >>>> infrastructure.
> >>>>
> >>>> One comment here is that we can start with all "authorized" users to
> >>>> trigger the CI in the committer group instead of introducing a new 
> >>>> concept
> >>>> "reviewer" - it will be another topic to discuss and I generally prefer
> >>>> more committership to encourage participation instead of a complicated
> >>>> membership structure.
> >>>>
> >>>> Besides, a quick fixup for reducing traffic is setting "Fork pull request
> >>>> workflows from outside collaborators" option[1] as "Require approval for
> >>>> all outside collaborators". This is provided out-of-the-box by GitHub and
> >>>> requires NO development[2]. Although it doesn't restrict people who are
> >>>> already apache org members but are not Pulsar committers, I believe the
> >>>> trust level is acceptable. An INFRA team member will be asked to perform
> >>>> the settings change if we want this.
> >>>>
> >>>> Best,
> >>>> tison.
> >>>>
> >>>> [1] https://github.com/apache/pulsar/settings/actions
> >>>> [2]
> >>>> https://docs.github.com/en/actions/managing-workflow-runs/approving-workflow-runs-from-public-forks
> >>>>
> >>>>
> >>>> Lari Hotari <lhot...@apache.org> 于2022年9月15日周四 16:36写道：
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> The GitHub Actions based Pulsar CI has been experiencing issues for
> >>>>> multiple weeks. The condition is currently better, but the resource
> >>>>> shortage issue remains. CI builds will take a long time to complete even
> >>>>> after many optimizations have been made.
> >>>>>
> >>>>> There's a long email thread with some details about the past issues:
> >>>>> https://lists.apache.org/thread/p7rb04vf1mt0kk3v2r7xl9dvb3zkhtxf
> >>>>>
> >>>>> I have filed an issue to GitHub support about the CI issues over a week
> >>>>> ago, and I finally received an answer a few hours ago. However the
> >>>>> GitHub support person didn't reply to my questions at all, but instead
> >>>>> suggested that there's a beta program where it's possible to pay for
> >>>>> more resources. That solution isn't suitable for our case, since it
> >>>>> doesn't seem to be possible to assign GitHub Actions Runner VM resources
> >>>>> only for a specific Apache project. I'll follow up with GitHub support, 
> >>>>> but
> >>>>> I don't expect that to resolve our problems in the near term. We need
> >>>>> to make changes in our CI resource consumption.
> >>>>>
> >>>>> In a the-asf Slack thread [1] about Pulsar CI issues, Martin Grigorov
> >>>>> suggested: "Apache Spark project requires that all PRs are executed in
> >>>>> the contributor's GHA quota. Maybe Pulsar can do the same ?!"
> >>>>>
> >>>>> The Apache Spark contributing guide contains details about this in the
> >>>>> "Pull request" section, https://spark.apache.org/contributing.html .
> >>>>>
> >>>>> "Before creating a pull request in Apache Spark, it is important to
> >>>>> check if tests can pass on your branch because our GitHub Actions
> >>>>> workflows automatically run tests for your pull request/following
> >>>>> commits and every run burdens the limited resources of GitHub Actions in
> >>>>> Apache Spark repository. "
> >>>>>
> >>>>> In Pulsar, we will need to do the same. As a solution to this, Tison
> >>>>> suggested that we would not run all tests for the PR unless there's a
> >>>>> "ready-to-test" label on the PR.
> >>>>>
> >>>>> I think this is a good suggestion. We could extend the existing
> >>>>> "pulsarbot" to help with the automation.
> >>>>>
> >>>>> A reviewer could comment "/pulsarbot ready-to-test" on the PR and
> >>>>> pulsarbot would add the label and also restart the CI workflow to make
> >>>>> it proceed and run the tests.
> >>>>> pulsarbot would check for authorized users. One simple
> >>>>> approach would be to add a file ".pulsarci.yaml" in apache/pulsar
> >>>>> repository with the relevant information:
> >>>>>
> >>>>> committer_github_ids:
> >>>>> - committer1
> >>>>> - committer2
> >>>>> ...
> >>>>>
> >>>>> ready_to_test:
> >>>>> authorized_github_ids:
> >>>>>   - userid1
> >>>>>   - userid2
> >>>>>   ...
> >>>>>
> >>>>> We would have a script to synchronize all Pulsar committers to this file
> >>>>> peridiotically (manual step after there's a new committer). ASF provides
> >>>>> public json files for project members at
> >>>>> https://whimsy.apache.org/public/public_ldap_projects.json , however the
> >>>>> mapping to github user names seems to be missing. That could be done
> >>>>> with a custom script since ASF LDAP contains the github username.
> >>>>>
> >>>>> All Pulsar committers would have access. In addition, there could be 
> >>>>> other
> >>>>> users that are authorized for using "/pulsarbot ready-to-test".
> >>>>>
> >>>>> This solution would also require changes in the GitHub Actions workflows
> >>>>> so that the workflow is failed in an early step unless there's a
> >>>>> ready-to-test label for the PR.
> >>>>>
> >>>>> With the above solution, we would be able to cut the amount of
> >>>>> unnecessary builds and get the excessive resource consumption issue
> >>>>> under control. The PR authors would be instructed to run initial PR
> >>>>> builds in their own fork and the reviewer should check that this is done
> >>>>> before approving the PR for testing with "/pulsarbot ready-to-test".
> >>>>>
> >>>>> I would suggest proceeding quickly on this matter without separate PIPs
> >>>>> or votes. We could follow the Apache lazy consensus
> >>>>> (https://community.apache.org/committers/lazyConsensus.html) principle
> >>>>> and make this happen if there aren't objections in the next 72 hours.
> >>>>> The improvement suggestions to this proposal would obviously be taken
> >>>>> into account and if someone objects, we wouldn't have reached lazy
> >>>>> consensus and we wouldn't proceed.
> >>>>>
> >>>>> -Lari
> >>>>>
> >>>>>
> >>>>> 1 -
> >>>>> https://the-asf.slack.com/archives/CBX4TSBQ8/p1661849820238809?thread_ts=1661512133.913279&cid=CBX4TSBQ8
> >>>>>
> >>>>
> >>
> >>
>

Re: [CI] Change to be made in Pulsar CI to mitigate CI resource consumption issues

Reply via email to