Thanks for drafting this proposal Nico. I hope that we can improve our development processes and build system stability in the long run with the move to GHA. Hence +1 for this proposal and the timeline. The plan looks thoroughly planned.
Cheers, Till On Thu, Dec 16, 2021 at 4:29 PM Chesnay Schepler <[email protected]> wrote: > We will not use Apache resources, but install self-hosted runners on our > current CI machines, similar to what we have done with Azure. > > On 16/12/2021 16:07, Fabian Paul wrote: > > Hi Nico, > > > > Thanks a lot for drafting the proposal. I really like the > > fully-fledged phasing model. All in all, I am +1 to move away from > > azure and can only second all the points you have mentioned. > > > > I only want to clarify one point. So far my understanding was that the > > GHA resources are managed on a GitHub organizational level in contrast > > to Azure pipelines where projects have certain resources. What happens > > if more and more projects inside the Apache Github organization > > migrate to GHA? Will this affect the build queue time? > > > > Best, > > Fabian > > > > On Thu, Dec 16, 2021 at 3:59 PM Nicolaus Weidner > > <[email protected]> wrote: > >> Hi all, > >> > >> as several people know by now, we are planning to move from Azure CI to > >> Github Actions. This is motivated by (not an exhaustive list): > >> - Not needing to mirror the repo anymore for CI > >> - Improving the contributor experience, especially for new contributors > >> - GHA development being more active than Azure CI development > >> > >> In case someone wants to check out the current version of the planned > GHA > >> workflow, you can find it here: > >> > https://github.com/ververica/flink/blob/master/.github/workflows/hadoop-2.8.3-scala-2.12-workflow.yml > >> Past runs can be seen here: https://github.com/ververica/flink/actions > (lots > >> of red, but this is almost always not due to the workflow) > >> > >> I want to put a draft for the migration roadmap up for discussion. It's > >> divided into several phases: > >> > >> *Phase 1: *GHA activated on master (but not required) > >> - A single CI machine is converted to run GHA runners (instead of Azure > >> runners) and runs the workflow on pushes to master > >> - Azure CI remains unchanged and is still the source of truth > >> - We can compare runtimes and behavior/failures > >> - Timeframe: 2 weeks > >> > >> *Phase 2: *Additional features > >> - Any additional functionality that we want to add to GHA is added (e.g. > >> not running the workflow if workflow files were modified) > >> - Functionality from FlinkCIBot that we want to keep is ported over > >> (syncing with the mirror repo can be dropped, but there are some > automated > >> checks that we want to keep) > >> - We can monitor whether performance is impacted by any change > >> - Timeframe: 2 weeks > >> > >> *Phase 3: *Cron jobs and (some) PR triggers run on GHA > >> - GHA cron builds activated (for master and release branches) > >> - Note: Includes some backports to all affected branches, else the > >> workflows won’t run: > >> > https://stackoverflow.com/questions/61989951/github-action-workflow-not-running/61992817#61992817 > >> - GHA builds run for PRs of select committers (the idea is to try out > >> builds for all the intended trigger conditions) > >> - Timeframe: 1 week > >> > >> *Up to this point, the existing CI pipeline is mostly unaffected - we > only > >> took away one CI machine.* > >> > >> *Phase 4: *Full switch to GHA > >> - Set up GHA runners on all machines > >> - GHA builds are activated for all PRs > >> - Either Azure or GHA build is required > >> - GHA runners are activated, Azure runners are deactivated (but not yet > >> removed) apart from 1 machine (for stragglers) > >> - Azure cron jobs are disabled, but kept around in case we need to > revert > >> - Timeframe: 1-2 weeks > >> > >> *Phase 5: *Removal of Azure CI leftovers > >> - Only after we are satisfied that GHA is stable (at least 1 month after > >> the switch, can be longer) > >> - Green GHA build is required from now on > >> - Stale PRs that don't have a GHA run will have to trigger a new one > (but > >> they would most likely have to rebase anyway...) > >> - (old) FlinkCIBot is disabled > >> - Azure yamls are deleted > >> - Azure runners are removed from machines > >> > >> > >> Timing-wise, the full switch to GHA should happen during a quiet time, > far > >> away from a release. The remaining phases shouldn't have much impact, > but > >> right before a release is not a good moment, of course. > >> Please give us your thoughts and point out anything we missed or that > >> doesn't seem to make sense! > >> > >> Best, > >> Nico > > >
