We will not use Apache resources, but install self-hosted runners on our current CI machines, similar to what we have done with Azure.

On 16/12/2021 16:07, Fabian Paul wrote:
Hi Nico,

Thanks a lot for drafting the proposal. I really like the
fully-fledged phasing model. All in all, I am +1 to move away from
azure and can only second all the points you have mentioned.

I only want to clarify one point. So far my understanding was that the
GHA resources are managed on a GitHub organizational level in contrast
to Azure pipelines where projects have certain resources. What happens
if more and more projects inside the Apache Github organization
migrate to GHA? Will this affect the build queue time?

Best,
Fabian

On Thu, Dec 16, 2021 at 3:59 PM Nicolaus Weidner
<nicolaus.weid...@ververica.com> wrote:
Hi all,

as several people know by now, we are planning to move from Azure CI to
Github Actions. This is motivated by (not an exhaustive list):
- Not needing to mirror the repo anymore for CI
- Improving the contributor experience, especially for new contributors
- GHA development being more active than Azure CI development

In case someone wants to check out the current version of the planned GHA
workflow, you can find it here:
https://github.com/ververica/flink/blob/master/.github/workflows/hadoop-2.8.3-scala-2.12-workflow.yml
Past runs can be seen here: https://github.com/ververica/flink/actions (lots
of red, but this is almost always not due to the workflow)

I want to put a draft for the migration roadmap up for discussion. It's
divided into several phases:

*Phase 1: *GHA activated on master (but not required)
- A single CI machine is converted to run GHA runners (instead of Azure
runners) and runs the workflow on pushes to master
- Azure CI remains unchanged and is still the source of truth
- We can compare runtimes and behavior/failures
- Timeframe: 2 weeks

*Phase 2: *Additional features
- Any additional functionality that we want to add to GHA is added (e.g.
not running the workflow if workflow files were modified)
- Functionality from FlinkCIBot that we want to keep is ported over
(syncing with the mirror repo can be dropped, but there are some automated
checks that we want to keep)
- We can monitor whether performance is impacted by any change
- Timeframe: 2 weeks

*Phase 3: *Cron jobs and (some) PR triggers run on GHA
- GHA cron builds activated (for master and release branches)
     - Note: Includes some backports to all affected branches, else the
workflows won’t run:
https://stackoverflow.com/questions/61989951/github-action-workflow-not-running/61992817#61992817
- GHA builds run for PRs of select committers (the idea is to try out
builds for all the intended trigger conditions)
- Timeframe: 1 week

*Up to this point, the existing CI pipeline is mostly unaffected - we only
took away one CI machine.*

*Phase 4: *Full switch to GHA
- Set up GHA runners on all machines
- GHA builds are activated for all PRs
- Either Azure or GHA build is required
- GHA runners are activated, Azure runners are deactivated (but not yet
removed) apart from 1 machine (for stragglers)
- Azure cron jobs are disabled, but kept around in case we need to revert
- Timeframe: 1-2 weeks

*Phase 5: *Removal of Azure CI leftovers
- Only after we are satisfied that GHA is stable (at least 1 month after
the switch, can be longer)
- Green GHA build is required from now on
- Stale PRs that don't have a GHA run will have to trigger a new one (but
they would most likely have to rebase anyway...)
- (old) FlinkCIBot is disabled
- Azure yamls are deleted
- Azure runners are removed from machines


Timing-wise, the full switch to GHA should happen during a quiet time, far
away from a release. The remaining phases shouldn't have much impact, but
right before a release is not a good moment, of course.
Please give us your thoughts and point out anything we missed or that
doesn't seem to make sense!

Best,
Nico


Reply via email to