Thanks for drafting this proposal Nico.

I hope that we can improve our development processes and build system
stability in the long run with the move to GHA. Hence +1 for this proposal
and the timeline. The plan looks thoroughly planned.

Cheers,
Till

On Thu, Dec 16, 2021 at 4:29 PM Chesnay Schepler <[email protected]> wrote:

> We will not use Apache resources, but install self-hosted runners on our
> current CI machines, similar to what we have done with Azure.
>
> On 16/12/2021 16:07, Fabian Paul wrote:
> > Hi Nico,
> >
> > Thanks a lot for drafting the proposal. I really like the
> > fully-fledged phasing model. All in all, I am +1 to move away from
> > azure and can only second all the points you have mentioned.
> >
> > I only want to clarify one point. So far my understanding was that the
> > GHA resources are managed on a GitHub organizational level in contrast
> > to Azure pipelines where projects have certain resources. What happens
> > if more and more projects inside the Apache Github organization
> > migrate to GHA? Will this affect the build queue time?
> >
> > Best,
> > Fabian
> >
> > On Thu, Dec 16, 2021 at 3:59 PM Nicolaus Weidner
> > <[email protected]> wrote:
> >> Hi all,
> >>
> >> as several people know by now, we are planning to move from Azure CI to
> >> Github Actions. This is motivated by (not an exhaustive list):
> >> - Not needing to mirror the repo anymore for CI
> >> - Improving the contributor experience, especially for new contributors
> >> - GHA development being more active than Azure CI development
> >>
> >> In case someone wants to check out the current version of the planned
> GHA
> >> workflow, you can find it here:
> >>
> https://github.com/ververica/flink/blob/master/.github/workflows/hadoop-2.8.3-scala-2.12-workflow.yml
> >> Past runs can be seen here: https://github.com/ververica/flink/actions
> (lots
> >> of red, but this is almost always not due to the workflow)
> >>
> >> I want to put a draft for the migration roadmap up for discussion. It's
> >> divided into several phases:
> >>
> >> *Phase 1: *GHA activated on master (but not required)
> >> - A single CI machine is converted to run GHA runners (instead of Azure
> >> runners) and runs the workflow on pushes to master
> >> - Azure CI remains unchanged and is still the source of truth
> >> - We can compare runtimes and behavior/failures
> >> - Timeframe: 2 weeks
> >>
> >> *Phase 2: *Additional features
> >> - Any additional functionality that we want to add to GHA is added (e.g.
> >> not running the workflow if workflow files were modified)
> >> - Functionality from FlinkCIBot that we want to keep is ported over
> >> (syncing with the mirror repo can be dropped, but there are some
> automated
> >> checks that we want to keep)
> >> - We can monitor whether performance is impacted by any change
> >> - Timeframe: 2 weeks
> >>
> >> *Phase 3: *Cron jobs and (some) PR triggers run on GHA
> >> - GHA cron builds activated (for master and release branches)
> >>      - Note: Includes some backports to all affected branches, else the
> >> workflows won’t run:
> >>
> https://stackoverflow.com/questions/61989951/github-action-workflow-not-running/61992817#61992817
> >> - GHA builds run for PRs of select committers (the idea is to try out
> >> builds for all the intended trigger conditions)
> >> - Timeframe: 1 week
> >>
> >> *Up to this point, the existing CI pipeline is mostly unaffected - we
> only
> >> took away one CI machine.*
> >>
> >> *Phase 4: *Full switch to GHA
> >> - Set up GHA runners on all machines
> >> - GHA builds are activated for all PRs
> >> - Either Azure or GHA build is required
> >> - GHA runners are activated, Azure runners are deactivated (but not yet
> >> removed) apart from 1 machine (for stragglers)
> >> - Azure cron jobs are disabled, but kept around in case we need to
> revert
> >> - Timeframe: 1-2 weeks
> >>
> >> *Phase 5: *Removal of Azure CI leftovers
> >> - Only after we are satisfied that GHA is stable (at least 1 month after
> >> the switch, can be longer)
> >> - Green GHA build is required from now on
> >> - Stale PRs that don't have a GHA run will have to trigger a new one
> (but
> >> they would most likely have to rebase anyway...)
> >> - (old) FlinkCIBot is disabled
> >> - Azure yamls are deleted
> >> - Azure runners are removed from machines
> >>
> >>
> >> Timing-wise, the full switch to GHA should happen during a quiet time,
> far
> >> away from a release. The remaining phases shouldn't have much impact,
> but
> >> right before a release is not a good moment, of course.
> >> Please give us your thoughts and point out anything we missed or that
> >> doesn't seem to make sense!
> >>
> >> Best,
> >> Nico
>
>
>

Reply via email to